Open Source / Apache 2.0

PostgreSQL HA
at the Edge

Centralized management for PostgreSQL High Availability clusters across 100s of edge Kubernetes clusters. Automatic failover, split-brain prevention, and deep observability from a single dashboard.

500+ Edge Clusters
<5s Failover Time
0 Dependencies
3 Go Binaries

The Problem

!
Edge clusters are behind NAT

You can't reach them directly. Traditional push-based management doesn't work.

!
Failover must be local

A round-trip to a central control plane during a primary failure adds unacceptable latency.

!
Split-brain is catastrophic

Two primaries writing to the same cluster corrupts data permanently.

!
No visibility at scale

Managing hundreds of PG clusters across edge sites with kubectl is impossible.

The Solution

Satellite-initiated connections

Satellites connect outward via persistent gRPC streams. No inbound access required.

Per-pod failover sidecar

Leader election via K8s Leases. Promotion in seconds with no central dependency.

SQL fencing + timeline recovery

Immediate write blocking, automatic pg_rewind, zero-restart container recovery.

Single pane of glass

Web dashboard with real-time health, replication lag, slow queries, and one-click switchover.

Features

Everything you need to run PostgreSQL HA at the edge, built from scratch in Go.

Automatic Failover

Per-pod sidecar with Kubernetes Lease-based leader election. Primary failure detected in 15 seconds, promotion in under 5.

🛡

Split-Brain Prevention

SQL fencing blocks writes and kills connections instantly. Old primary auto-demotes to standby via K8s exec. No data corruption.

🔄

Timeline Recovery

After failover, replicas auto-detect timeline divergence and recover via pg_rewind or re-basebackup. No manual intervention.

📈

Deep Observability

Replication lag, connections, disk, WAL stats, per-database cache hit ratios, table stats, and slow queries from pg_stat_statements.

🛠

Planned Switchover

One-click primary promotion from the dashboard. CHECKPOINT, fence, lease transfer, promote - with step-by-step confirmation.

📦

Profiles & Rules

Reusable cluster templates with deployment rules that auto-deploy to satellites matching label selectors. Fleet-scale management.

🌐

No CNPG Dependency

Builds StatefulSets, Services, ConfigMaps, Secrets, and RBAC from scratch. No CRDs, no operator frameworks, minimal footprint.

🔒

Secure by Default

Token-based auth with SHA-256 hashing. Constant-time comparison. Create-only secrets. Identity stored in K8s Secrets.

Zero-Restart Recovery

Postgres container runs in a supervisor loop. Demotion, timeline recovery, and restart happen in-place without K8s restart counts.

Architecture

Three Go binaries. Bidirectional gRPC streaming. No external dependencies beyond PostgreSQL.

Central Control Plane gRPC :9090 REST :8080 Dashboard PostgreSQL (metadata) gRPC bidirectional streams Satellite A Operator + Health Monitor FS FS FS pod-0 pod-1 pod-2 K8s Edge Cluster PRIMARY REPLICA REPLICA Satellite B Operator + Health Monitor FS FS FS pod-0 pod-1 pod-2 K8s Edge Cluster Satellite N Operator + Health Monitor FS FS FS pod-0 pod-1 pod-2 K8s Edge Cluster ... FS = Failover Sidecar (per pod)

Central

cmd/central

gRPC server for satellite streams. REST API with 30+ endpoints. Embedded React dashboard. PostgreSQL metadata store with auto-migrations.

Satellite

cmd/satellite

Lightweight agent per edge cluster. Persistent gRPC stream with auto-reconnection. Kubernetes operator that builds PG clusters from JSON configs.

Failover Sidecar

cmd/failover-sidecar

Per-pod sidecar for leader election via K8s Leases. Detects split-brain, fences writes, demotes old primary, recovers timeline divergence.

How Failover Works

From primary failure to full recovery in under 20 seconds, with zero manual intervention.

1

Primary Dies

The primary pod crashes or becomes unreachable. The failover sidecar's lease renewal stops.

2

Lease Expires (15s)

After 15 seconds without renewal, the leader lease expires. All replica sidecars detect this on their next tick.

3

Replica Acquires Lease

One replica wins the lease via optimistic locking (resourceVersion). The others see the conflict and back off.

4

pg_promote()

The winning replica calls pg_promote(), transitions to primary, and labels its pod pg-swarm.io/role=primary.

5

RW Service Routes

The Kubernetes RW Service selector picks up the new primary label. Applications reconnect transparently.

6

Timeline Recovery

Other replicas detect the timeline divergence, run pg_rewind (or re-basebackup), and start streaming from the new primary. No container restarts.

Web Dashboard

Real-time visibility into every PostgreSQL instance across your fleet.

Cluster Overview

Instance table with role badges, ready/WAL status dots, connection bars, disk usage, and timeline IDs. Expand to see per-pod details.

Instance Detail

Drill down to see disk vs WAL breakdown, WAL statistics, per-database sizes with cache hit ratios, table stats, and slow queries.

Profile Editor

6-tab configuration editor: General, Volumes, Resources, PostgreSQL params (full catalog with 50+ parameters), HBA Rules, and Databases.

Deployment Rules

Map profiles to satellites via label selectors. When a rule matches, clusters are auto-created and pushed. Fleet-scale management.

One-Click Switchover

Promote any replica to primary with a detailed confirmation modal showing each step and a downtime warning.

Event Log

State transitions, failovers, switchovers, and errors with severity icons. Per-cluster event filtering.

Tech Stack

LanguageGo 1.26
CommunicationgRPC + Protobuf v3
DatabasePostgreSQL (pgx/v5)
REST APIGoFiber v2
Loggingzerolog
DashboardReact 19 + Vite + JSX
K8s Clientclient-go v0.35
DeploymentDocker + Kustomize

Quick Start

Get pg-swarm running locally in under a minute.

Docker Compose (local dev)
cd deploy/docker
docker-compose up -d

# Dashboard at http://localhost:8080
# gRPC at localhost:9090
Build from source
make build        # Compile all binaries
make test         # Run unit tests
make lint         # golangci-lint
Kubernetes (minikube)
make minikube-build-all    # Build images
make k8s-deploy-all       # Deploy via Kustomize
make k8s-status           # Check resources