Centralized management for PostgreSQL High Availability clusters across 100s of edge Kubernetes clusters. Automatic failover, split-brain prevention, and deep observability from a single dashboard.
You can't reach them directly. Traditional push-based management doesn't work.
A round-trip to a central control plane during a primary failure adds unacceptable latency.
Two primaries writing to the same cluster corrupts data permanently.
Managing hundreds of PG clusters across edge sites with kubectl is impossible.
Satellites connect outward via persistent gRPC streams. No inbound access required.
Leader election via K8s Leases. Promotion in seconds with no central dependency.
Immediate write blocking, automatic pg_rewind, zero-restart container recovery.
Web dashboard with real-time health, replication lag, slow queries, and one-click switchover.
Everything you need to run PostgreSQL HA at the edge, built from scratch in Go.
Per-pod sidecar with Kubernetes Lease-based leader election. Primary failure detected in 15 seconds, promotion in under 5.
SQL fencing blocks writes and kills connections instantly. Old primary auto-demotes to standby via K8s exec. No data corruption.
After failover, replicas auto-detect timeline divergence and recover via pg_rewind or re-basebackup. No manual intervention.
Replication lag, connections, disk, WAL stats, per-database cache hit ratios, table stats, and slow queries from pg_stat_statements.
Satellite-controlled 9-step orchestration with real-time progress tracking via WebSocket. Fence with drain, checkpoint, promote, with point-of-no-return indicator.
Reusable cluster templates with deployment rules that auto-deploy to satellites matching label selectors. Fleet-scale management.
Per-pod backup sidecar with GCS and SFTP storage. Physical (base + incremental + WAL archiving) and logical (pg_dump) backups scheduled via internal cron. PITR support. Being reimplemented from scratch.
Builds StatefulSets, Services, ConfigMaps, Secrets, and RBAC from scratch. No Custom Resource Definitions, no operator frameworks, minimal footprint.
Token-based auth with SHA-256 hashing. Constant-time comparison. Create-only secrets. Identity stored in K8s Secrets.
Postgres container runs in a supervisor loop. Demotion, timeline recovery, and restart happen in-place without K8s restart counts.
Three-mode config update system: pg_reload_conf for sighup params, rolling restart for postmaster params, full cluster shutdown for replication-sensitive params. Database-driven parameter classification with admin panel.
Dynamic database and user management at the cluster level. CREATE ROLE + CREATE DATABASE via sidecar command. Per-database IP subnet access control (HBA rules). Zero pod restart.
Centrally managed log-based recovery rules with pattern matching. 40+ built-in patterns for WAL issues, replication failures, and crash loops. Admin panel with regex sandbox.
Real-time visibility into every PostgreSQL instance across your fleet.
Instance table with role badges, ready/WAL status dots, connection bars, disk usage, and timeline IDs. Expand to see per-pod details.
Drill down to see disk vs WAL breakdown, WAL statistics, per-database sizes with cache hit ratios, table stats, and slow queries.
6-tab configuration editor: General, Volumes, Resources, PostgreSQL params (50+ parameters), HBA Rules, and Recovery Rules.
Map profiles to satellites via label selectors. When a rule matches, clusters are auto-created and pushed. Fleet-scale management.
9-step switchover progress modal with real-time WebSocket updates, point-of-no-return indicator, and rollback status for pre-PONR failures.
Admin panel for PostgreSQL parameter classification. Define which params use pg_reload_conf (sighup), rolling restart (postmaster), or full cluster shutdown (replication-sensitive).
Per-cluster database management tab. CREATE ROLE + CREATE DATABASE via sidecar command. CIDR-based IP access control (HBA rules) per database. Zero pod restart.
Version history with revert for profile changes. Per-cluster approval workflow before config updates are applied. Full audit trail of every configuration change.
State transitions, failovers, switchovers, backup completions, and errors with severity icons. Per-cluster event filtering.
Terminal-style log viewer with SSE streaming, server-side and client-side level filtering, remote log level control, auto-scroll, and clear.
Centrally managed recovery rule sets with inline rule editing, pattern sandbox for testing regex against sample log lines, and per-cluster attachment.
5-tab admin page: Storage Tiers with satellite mappings, Image Variants for postgres base images, PG Version registry, Recovery Rules editor, and Update Rules for parameter classification.
The embedded dashboard in action, running against mock data.
Overview
Clusters
Cluster Detail
Switchover Progress
Satellites
Profiles
Deployment Rules
Events
Admin
Recovery Rules
Update Rules
Bidirectional gRPC streaming across every layer. Continuous log monitoring with pattern-driven recovery rules. Real-time WebSocket state push. Sidecar command dispatch for zero-latency switchover.
cmd/central
gRPC server for satellite streams. REST API with 60+ endpoints. WebSocket hub for real-time updates. Embedded React dashboard. PostgreSQL metadata store with auto-migrations.
cmd/satellite
Lightweight agent per edge cluster. Persistent gRPC stream with auto-reconnection. Kubernetes operator that builds PG clusters from JSON configs.
cmd/failover-sidecar
Per-pod sidecar for leader election via K8s Leases. Detects split-brain, fences writes, demotes old primary, recovers timeline divergence. Log watcher with 40+ recovery patterns. Bidirectional gRPC streaming to satellite for command dispatch.
From primary failure to full recovery in under 20 seconds, with zero manual intervention.
The primary pod crashes or becomes unreachable. The failover sidecar's lease renewal stops.
After 15 seconds without renewal, the leader lease expires. All replica sidecars detect this on their next tick.
One replica wins the lease via optimistic locking (resourceVersion). The others see the conflict and back off.
The winning replica calls pg_promote(), transitions to primary, and labels its pod pg-swarm.io/role=primary.
The Kubernetes RW Service selector picks up the new primary label. Applications reconnect transparently.
Other replicas detect the timeline divergence, run pg_rewind (or re-basebackup), and start streaming from the new primary. No container restarts.
Per-pod backup sidecar with role-aware scheduling. Primary handles WAL archiving, replicas run scheduled backups. Internal cron, gzip compression, and point-in-time recovery.
Full base backups and incremental backups (PG 17+ changed blocks). WAL archiving on the primary with archive_command auto-configured via pg_reload_conf. No pod restart needed.
pg_dump-based logical backups with gzip compression. Per-database or full cluster. Scheduled independently from physical backups via internal cron.
Primary pod handles WAL archiving and metadata. Replica pods run scheduled base, incremental, and logical backups. Automatic role detection at sidecar startup.
Backup configuration auto-sets archive_mode, archive_command, and summarize_wal in postgresql.conf. HBA changes applied via pg_reload_conf with zero downtime.
Restore to any point in time from the dashboard. Select a base backup and target timestamp. The satellite handles StatefulSet scaling and recovery setup.
All backups are compressed before upload. Base backups use pg_basebackup -z. Logical dumps are gzipped. Retention managed by the sidecar scheduler.
gcs / sftp
Google Cloud Storage and SFTP servers. Credentials stored as K8s Secrets on each satellite. Destination configured per cluster.
cmd/backup-sidecar
Per-pod sidecar container injected by the satellite operator. Runs alongside PostgreSQL with shared volume access. Internal cron scheduler for all backup types. HTTP API for status and on-demand triggers.
Get PG-Swarm running locally in under a minute.
make build # Compile all binaries
make test # Run unit tests
make lint # golangci-lint
make minikube-build-all
make k8s-deploy-all
make k8s-status
# Push images to your registry
DOCKER_REPO=your.registry/pg-swarm make docker-push-all
# Deploy central (with metadata PG)
kubectl apply -k deploy/k8s/central/base/
# Deploy satellite per edge cluster — edit configmap to set CENTRAL_ADDR, K8S_CLUSTER_NAME, and REGION
kubectl apply -k deploy/k8s/satellite/base/
# Or create a custom kustomize overlay: deploy/k8s/satellite/overlays/prod/kustomization.yaml