Centralized management for PostgreSQL High Availability clusters across 100s of edge Kubernetes clusters. Automatic failover, split-brain prevention, and deep observability from a single dashboard.
You can't reach them directly. Traditional push-based management doesn't work.
A round-trip to a central control plane during a primary failure adds unacceptable latency.
Two primaries writing to the same cluster corrupts data permanently.
Managing hundreds of PG clusters across edge sites with kubectl is impossible.
Satellites connect outward via persistent gRPC streams. No inbound access required.
Leader election via K8s Leases. Promotion in seconds with no central dependency.
Immediate write blocking, automatic pg_rewind, zero-restart container recovery.
Web dashboard with real-time health, replication lag, slow queries, and one-click switchover.
Everything you need to run PostgreSQL HA at the edge, built from scratch in Go.
Per-pod sidecar with Kubernetes Lease-based leader election. Primary failure detected in 15 seconds, promotion in under 5.
SQL fencing blocks writes and kills connections instantly. Old primary auto-demotes to standby via K8s exec. No data corruption.
After failover, replicas auto-detect timeline divergence and recover via pg_rewind or re-basebackup. No manual intervention.
Replication lag, connections, disk, WAL stats, per-database cache hit ratios, table stats, and slow queries from pg_stat_statements.
One-click primary promotion from the dashboard. CHECKPOINT, fence, lease transfer, promote - with step-by-step confirmation.
Reusable cluster templates with deployment rules that auto-deploy to satellites matching label selectors. Fleet-scale management.
Builds StatefulSets, Services, ConfigMaps, Secrets, and RBAC from scratch. No CRDs, no operator frameworks, minimal footprint.
Token-based auth with SHA-256 hashing. Constant-time comparison. Create-only secrets. Identity stored in K8s Secrets.
Postgres container runs in a supervisor loop. Demotion, timeline recovery, and restart happen in-place without K8s restart counts.
Three Go binaries. Bidirectional gRPC streaming. No external dependencies beyond PostgreSQL.
cmd/central
gRPC server for satellite streams. REST API with 30+ endpoints. Embedded React dashboard. PostgreSQL metadata store with auto-migrations.
cmd/satellite
Lightweight agent per edge cluster. Persistent gRPC stream with auto-reconnection. Kubernetes operator that builds PG clusters from JSON configs.
cmd/failover-sidecar
Per-pod sidecar for leader election via K8s Leases. Detects split-brain, fences writes, demotes old primary, recovers timeline divergence.
From primary failure to full recovery in under 20 seconds, with zero manual intervention.
The primary pod crashes or becomes unreachable. The failover sidecar's lease renewal stops.
After 15 seconds without renewal, the leader lease expires. All replica sidecars detect this on their next tick.
One replica wins the lease via optimistic locking (resourceVersion). The others see the conflict and back off.
The winning replica calls pg_promote(), transitions to primary, and labels its pod pg-swarm.io/role=primary.
The Kubernetes RW Service selector picks up the new primary label. Applications reconnect transparently.
Other replicas detect the timeline divergence, run pg_rewind (or re-basebackup), and start streaming from the new primary. No container restarts.
Real-time visibility into every PostgreSQL instance across your fleet.
Instance table with role badges, ready/WAL status dots, connection bars, disk usage, and timeline IDs. Expand to see per-pod details.
Drill down to see disk vs WAL breakdown, WAL statistics, per-database sizes with cache hit ratios, table stats, and slow queries.
6-tab configuration editor: General, Volumes, Resources, PostgreSQL params (full catalog with 50+ parameters), HBA Rules, and Databases.
Map profiles to satellites via label selectors. When a rule matches, clusters are auto-created and pushed. Fleet-scale management.
Promote any replica to primary with a detailed confirmation modal showing each step and a downtime warning.
State transitions, failovers, switchovers, and errors with severity icons. Per-cluster event filtering.
Get pg-swarm running locally in under a minute.
cd deploy/docker
docker-compose up -d
# Dashboard at http://localhost:8080
# gRPC at localhost:9090
make build # Compile all binaries
make test # Run unit tests
make lint # golangci-lint
make minikube-build-all # Build images
make k8s-deploy-all # Deploy via Kustomize
make k8s-status # Check resources