How to Connect Kubernetes Pods to Azure PostgreSQL via IPsec VPN with PgBouncer and EndpointSlice

A comprehensive guide on setting up a site-to-site IPsec VPN tunnel using Libreswan to securely connect Kubernetes (k3s) workloads to Azure PostgreSQL through PgBouncer connection pooling and Kubernetes EndpointSlice, all automated with SaltStack.

When your Kubernetes workloads live on Hetzner Cloud but your PostgreSQL database sits inside an Azure Virtual Network, you need a secure, reliable bridge between the two. A direct public-internet connection to the database is a non-starter for security reasons, and Azure Private Link does not extend to non-Azure infrastructure.

The solution is a site-to-site IPsec VPN tunnel that connects your infrastructure to the Azure VNet, combined with PgBouncer for connection pooling, and a Kubernetes Service with EndpointSlice to give pods a clean, DNS-discoverable database endpoint.

In this guide, I will walk through the complete setup – from establishing the IPsec tunnel using Libreswan, configuring PgBouncer as a VPN-aware connection pooler, securing the gateway with firewall rules, setting up automated tunnel monitoring, and finally exposing the database to Kubernetes pods using an EndpointSlice. All of this is automated with SaltStack, but the concepts and configurations apply regardless of your configuration management tool.

Architecture Overview

Before diving into the implementation, let us understand what we are building and why each component exists.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
┌──────────────────────────────────────────────────────────────────────────┐
│  Hetzner Cloud (10.5.0.0/16)                                            │
│                                                                          │
│  ┌──────────────────┐      ┌──────────────────────────────────────────┐  │
│  │  k3s Pods         │      │  VPN Gateway (vpn-gw)                    │  │
│  │  (10.42.x.x)      │──────│  Public IP: 46.110.80.45               │  │
│  │                   │      │  Tunnel IP: 10.5.0.22                    │  │
│  │  Connect via:     │      │                                          │  │
│  │  citicorp-postgres│      │  ┌────────────────────┐                  │  │
│  │  .stage.svc       │      │  │  PgBouncer :5432    │                  │  │
│  │  .cluster.local   │      │  │  Connection pooling │                  │  │
│  │  :5432            │      │  └─────────┬──────────┘                  │  │
│  └────────────────-──┘      │            │                              │  │
│                              │  ┌─────────▼──────────┐                  │  │
│                              │  │  Libreswan IPsec    │                  │  │
│                              │  │  IKEv2 Tunnel       │                  │  │
│                              │  └─────────┬──────────┘                  │  │
│                              └────────────┼─────────────────────────────┘  │
└───────────────────────────────────────────┼──────────────────────────────┘
                              IPsec Tunnel (AES-256, SHA-1, DH Group 14)
┌───────────────────────────────────────────▼──────────────────────────────┐
│  Azure VNet (10.70.0.0/16)                                               │
│                                                                          │
│  ┌──────────────────────────────────┐                                    │
│  │  Azure PostgreSQL                 │                                    │
│  │  (Private Endpoint)               │                                    │
│  │  statement-analyzer-uat-db        │                                    │
│  │  .postgres.database.azure.com     │                                    │
│  └──────────────────────────────────┘                                    │
│                                                                          │
│  Primary VPN Gateway:  42.170.70.119                                     │
│  Backup VPN Gateway:   24.99.201.54                                      │
└──────────────────────────────────────────────────────────────────────────┘

Why This Architecture?

Why not connect directly from pods to Azure PostgreSQL?

Azure PostgreSQL with a private endpoint is only accessible from within the Azure VNet. There is no public endpoint. You need a network-level bridge – the IPsec tunnel – to extend your network into the Azure VNet.

Why PgBouncer?

Kubernetes pods are ephemeral. They start, die, and restart constantly. Each new pod connection to PostgreSQL is expensive – it involves TCP handshake, TLS negotiation, and PostgreSQL authentication. Without connection pooling, you would exhaust the database connection limit quickly (Azure PostgreSQL has hard limits based on your tier). PgBouncer sits on the VPN gateway, maintains a pool of persistent connections to Azure PostgreSQL through the tunnel, and multiplexes hundreds of pod connections over those few pooled connections.

Why EndpointSlice instead of ExternalName Service?

An ExternalName Service creates a CNAME DNS record, which does not work well with PostgreSQL clients that validate hostnames during TLS. An EndpointSlice gives you a stable ClusterIP backed by the VPN gateway’s IP address. Pods connect to citicorp-postgres.stage.svc.cluster.local:5432 just like they would connect to any in-cluster database.

Why a dedicated VPN gateway server?

Isolating the VPN tunnel on its own server keeps the blast radius small. If the tunnel has issues, it does not affect your Kubernetes nodes. The gateway server also serves as a single point where you can apply firewall rules, monitor tunnel health, and manage PgBouncer independently.

Prerequisites

Before you begin, you need the following:

  • A dedicated Linux server for the VPN gateway (this guide uses Rocky Linux 10 on Hetzner Cloud cx23)
  • A running Kubernetes (k3s) cluster
  • An Azure Virtual Network Gateway with a site-to-site VPN configured
  • The following information from the Azure side:
    • Azure VPN Gateway public IP(s)
    • Azure VNet subnet(s) you need to reach
    • A pre-shared key (PSK) agreed upon by both parties
    • The Azure PostgreSQL private endpoint IP/hostname
    • PostgreSQL database credentials
  • SaltStack master configured (optional – you can adapt the configs for manual deployment)

Step 1: Setting Up the IPsec VPN Tunnel with Libreswan

Libreswan is the native IPsec implementation for RHEL-based distributions (Rocky Linux, AlmaLinux, CentOS). It is well-maintained, supports both IKEv1 and IKEv2, and integrates cleanly with the Linux kernel’s XFRM framework for packet encryption.

1.1 Install Libreswan

1
dnf install -y libreswan

1.2 Configure Kernel Parameters

IPsec tunneling requires specific kernel parameters. IP forwarding must be enabled because the VPN gateway routes packets between your network and the Azure VNet. Reverse path filtering must be disabled because IPsec decapsulated packets arrive on a different interface than expected, and the kernel would otherwise drop them.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
# Enable IP forwarding (required for tunnel routing)
sysctl -w net.ipv4.ip_forward=1
echo "net.ipv4.ip_forward = 1" >> /etc/sysctl.d/99-ipsec.conf

# Disable reverse path filtering (required for IPsec)
sysctl -w net.ipv4.conf.all.rp_filter=0
sysctl -w net.ipv4.conf.default.rp_filter=0
echo "net.ipv4.conf.all.rp_filter = 0" >> /etc/sysctl.d/99-ipsec.conf
echo "net.ipv4.conf.default.rp_filter = 0" >> /etc/sysctl.d/99-ipsec.conf

# Disable ICMP redirects (security best practice for VPN gateways)
sysctl -w net.ipv4.conf.all.send_redirects=0
sysctl -w net.ipv4.conf.default.send_redirects=0
sysctl -w net.ipv4.conf.all.accept_redirects=0
sysctl -w net.ipv4.conf.default.accept_redirects=0
echo "net.ipv4.conf.all.send_redirects = 0" >> /etc/sysctl.d/99-ipsec.conf
echo "net.ipv4.conf.default.send_redirects = 0" >> /etc/sysctl.d/99-ipsec.conf
echo "net.ipv4.conf.all.accept_redirects = 0" >> /etc/sysctl.d/99-ipsec.conf
echo "net.ipv4.conf.default.accept_redirects = 0" >> /etc/sysctl.d/99-ipsec.conf

Why disable ICMP redirects? On a VPN gateway, ICMP redirects can cause routing loops or allow an attacker to redirect traffic away from the tunnel. Disabling them ensures all traffic follows the intended IPsec routing path.

1.3 Initialize the NSS Database

Libreswan uses Mozilla’s NSS (Network Security Services) library for cryptographic operations. The NSS database must be initialized before Libreswan can start:

1
ipsec initnss --nssdir /var/lib/ipsec/nss

This creates the certificate database (cert9.db, key4.db) that Libreswan uses internally, even when you are using pre-shared keys rather than certificates.

1.4 Configure the IPsec Connection

Create the connection configuration file at /etc/ipsec.d/citicorp.conf:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
conn citicorp-azure
    type=tunnel
    authby=secret
    ikev2=insist

    # Our side (left = local)
    left=46.110.80.45
    leftid=46.110.80.45
    leftsubnet=10.5.0.22/32
    leftsourceip=10.5.0.22

    # Their side (right = remote / Azure)
    right=42.170.70.119
    rightid=42.170.70.119
    rightsubnet=10.1.0.4/32,10.70.0.25/32,10.70.3.69/32

    # Encryption parameters
    ike=aes256-sha1;modp2048
    phase2alg=aes256-sha1
    pfs=no

    # Lifetimes
    ikelifetime=3600s
    salifetime=3600s

    # Bring up automatically and keep alive
    auto=start
    dpddelay=30
    dpdtimeout=120

    # Azure compatibility
    fragmentation=yes
    narrowing=yes

Let me break down the key parameters:

ParameterValueWhy
type=tunnelTunnel modeEncapsulates entire IP packets. Required for site-to-site VPN (as opposed to transport mode which only encrypts the payload).
authby=secretPre-shared keyBoth sides authenticate using a shared secret. Simpler than certificate-based auth for a single tunnel.
ikev2=insistForce IKEv2IKEv2 is more secure and reliable than IKEv1. The insist directive ensures the connection will not fall back to IKEv1.
left / rightPublic IPsThe public IP addresses of each VPN gateway. “Left” is always the local side.
leftsubnet / rightsubnetPrivate subnetsDefines which traffic is routed through the tunnel. Our side exposes only the gateway IP (10.5.0.22/32), while the Azure side exposes multiple database endpoints.
leftsourceip10.5.0.22The source IP address used for packets entering the tunnel. This ensures Azure sees traffic coming from our tunnel IP, not our public IP.
ike=aes256-sha1;modp2048Phase 1 encryptionAES-256 encryption with SHA-1 hash and DH Group 14 (2048-bit) for key exchange. This is negotiated during IKE (Internet Key Exchange) Phase 1.
phase2alg=aes256-sha1Phase 2 encryptionThe actual data encryption algorithm for tunnel traffic.
pfs=noNo Perfect Forward SecrecyPFS generates new keys for each Phase 2 negotiation. Disabled here to match Azure’s default configuration.
auto=startAuto-connectThe tunnel is established automatically when the IPsec service starts.
dpddelay=30Dead Peer DetectionSends a keepalive probe every 30 seconds. If no response is received within dpdtimeout (120 seconds), Libreswan considers the peer dead and attempts to re-establish the tunnel.
fragmentation=yesIKE fragmentationAzure VPN Gateways sometimes send large IKE packets. Enabling fragmentation prevents packet drops on networks with lower MTU.
narrowing=yesTraffic selector narrowingAllows Azure to narrow the proposed traffic selectors during Phase 2 negotiation. Required for compatibility with Azure VPN Gateways that use specific subnet combinations.

1.5 Configure the Pre-Shared Key

Create the secrets file at /etc/ipsec.d/citicorp.secrets with restrictive permissions:

1
2
# Format: <local_ip> <remote_ip> : PSK "shared_secret"
46.110.80.45 42.170.70.119 : PSK "YOUR_STRONG_PSK_HERE"

Set the permissions so only root can read it:

1
chmod 0600 /etc/ipsec.d/citicorp.secrets

Important: The pre-shared key must be agreed upon with the Azure VPN Gateway administrator. Use a strong, randomly generated key of at least 32 characters.

1.6 Optional: Backup Azure Gateway

If the Azure side provides a secondary VPN Gateway for high availability, add a backup connection to the same configuration file:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
conn citicorp-azure-backup
    type=tunnel
    authby=secret
    ikev2=insist

    left=46.110.80.45
    leftid=46.110.80.45
    leftsubnet=10.5.0.22/32
    leftsourceip=10.5.0.22

    right=24.99.201.54
    rightid=24.99.201.54
    rightsubnet=10.1.0.4/32,10.70.0.25/32,10.70.3.69/32

    ike=aes256-sha1;modp2048
    phase2alg=aes256-sha1
    pfs=no

    ikelifetime=3600s
    salifetime=3600s

    auto=start
    dpddelay=30
    dpdtimeout=120
    fragmentation=yes
    narrowing=yes

Add a corresponding PSK entry in the secrets file:

1
46.110.80.45 24.99.201.54 : PSK "YOUR_STRONG_PSK_HERE"

1.7 Start the Tunnel

1
2
3
4
5
# Enable and start the IPsec service
systemctl enable --now ipsec

# Bring up the connection
ipsec up citicorp-azure

1.8 Verify the Tunnel

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
# Check overall status
ipsec status

# Check traffic flowing through the tunnel
ipsec trafficstatus

# Verify kernel XFRM policies and states
ip xfrm state
ip xfrm policy

# Test connectivity to Azure endpoint through the tunnel
ping -c 3 10.70.0.25

A successful tunnel shows ESTABLISHED_IKE_SA (IKE Phase 1 completed) and ESTABLISHED_CHILD_SA (IPsec Phase 2 active with traffic selectors installed).

Step 2: Setting Up PgBouncer as a VPN-Aware Connection Pooler

With the tunnel up, you could technically have pods connect directly to the Azure PostgreSQL IP through the VPN gateway. But this approach has serious problems:

  1. Connection exhaustion: Every pod creates its own connection. With dozens of pods restarting frequently, you quickly hit Azure PostgreSQL’s connection limit.
  2. Latency multiplication: Each new connection traverses the VPN tunnel for TCP handshake + TLS + PostgreSQL auth. PgBouncer maintains persistent connections and avoids this overhead.
  3. Single endpoint: PgBouncer gives pods a single, stable IP:port to connect to, regardless of which Azure PostgreSQL IP is active.

2.1 Install PgBouncer

1
dnf install -y pgbouncer

2.2 Create the PgBouncer System User

1
useradd --system --home-dir /var/lib/pgbouncer --shell /bin/false pgbouncer

2.3 Create Required Directories

1
2
3
4
mkdir -p /etc/pgbouncer /var/log/pgbouncer /var/run/pgbouncer
chown pgbouncer:pgbouncer /etc/pgbouncer /var/log/pgbouncer /var/run/pgbouncer
chmod 750 /etc/pgbouncer
chmod 755 /var/log/pgbouncer /var/run/pgbouncer

Create a tmpfiles.d configuration to ensure the runtime directory is recreated after reboots (since /var/run is a tmpfs on most modern distributions):

1
2
3
4
cat > /etc/tmpfiles.d/pgbouncer.conf << 'EOF'
# PgBouncer runtime directory
d /var/run/pgbouncer 0755 pgbouncer pgbouncer -
EOF

2.4 Configure PgBouncer

Create /etc/pgbouncer/pgbouncer.ini:

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
;; PgBouncer configuration file (VPN proxy mode)
;; Proxies connections to Azure PostgreSQL through IPsec tunnel

[databases]
; Route database connections through VPN tunnel to Azure PostgreSQL
statement_analyzer_uat = host=statement-analyzer-uat-db.postgres.database.azure.com port=5432

[pgbouncer]
;;;
;;; Administrative settings
;;;

logfile = /var/log/pgbouncer/pgbouncer.log
pidfile = /var/run/pgbouncer/pgbouncer.pid

;;;
;;; Where to wait for clients
;;;

; Listen on all interfaces so k3s nodes can connect
listen_addr = *
listen_port = 5432

; Unix socket for local admin connections
unix_socket_dir = /var/run/pgbouncer

;;;
;;; Authentication settings
;;;

; SCRAM-SHA-256 is the modern PostgreSQL authentication method
auth_type = scram-sha-256
auth_file = /etc/pgbouncer/userlist.txt

;;;
;;; Connection pooling settings
;;;

; Transaction mode: connections are returned to the pool after each transaction.
; This gives the best connection reuse ratio. Session mode would tie a server
; connection to a client for the entire session, defeating the purpose of pooling.
pool_mode = transaction

; Maximum number of connections PgBouncer will open to Azure PostgreSQL.
; This should be set below the Azure PostgreSQL max_connections limit to leave
; room for administrative connections.
max_db_connections = 50

; Maximum number of client (pod) connections PgBouncer will accept.
; Pods beyond this limit will wait in the queue.
max_client_conn = 500

; Default number of server connections per user/database pair.
default_pool_size = 20

; Minimum number of server connections to keep open, even if idle.
; Keeps the pool warm to avoid connection setup latency on the first request.
min_pool_size = 3

; Extra connections allowed when the pool is exhausted and clients are waiting.
reserve_pool_size = 5
reserve_pool_timeout = 5

;;;
;;; Connection timeouts (tuned for VPN latency)
;;;

; Close server connections that have been idle for this many seconds.
; Set higher than default (600) because reconnecting through the VPN is expensive.
server_idle_timeout = 300

; Close and recreate server connections after this many seconds.
; Prevents stale connections from accumulating. 1800s = 30 minutes.
server_lifetime = 1800

; Do not timeout idle client connections (application manages its own lifecycle).
client_idle_timeout = 0

; No query timeout (application is responsible for query timeouts).
query_timeout = 0

; Maximum time a client will wait for a server connection from the pool.
; 120s is generous, covering VPN latency spikes and pool contention.
query_wait_timeout = 120

; Maximum time to wait for a TCP connection to Azure PostgreSQL through the VPN.
; 15s accounts for VPN tunnel latency. Default of 5s is too aggressive for cross-cloud.
server_connect_timeout = 15

;;;
;;; Logging
;;;

log_connections = 1
log_disconnections = 1
log_pooler_errors = 1
verbose = 0

;;;
;;; Console access control
;;;

admin_users = pgbouncer_admin
stats_users = pgbouncer_admin

;;;
;;; Connection sanity checks
;;;

; Check server connection health every 30 seconds with a lightweight query
server_check_query = SELECT 1
server_check_delay = 30

; Append the client hostname to application_name for debugging
application_name_add_host = 1

2.5 Configure Database Credentials

Create /etc/pgbouncer/userlist.txt with the credentials that pods will use to authenticate:

1
"spin_admin" "SCRAM-SHA-256$4096:salt_here==$stored_key_here=:server_key_here="

How to generate a SCRAM-SHA-256 hash: Connect to any PostgreSQL instance, create a user with the desired password, then query the hash:

1
2
CREATE USER spin_admin WITH PASSWORD 'your_password';
SELECT rolname, rolpassword FROM pg_authid WHERE rolname = 'spin_admin';

The rolpassword value (starting with SCRAM-SHA-256$) goes into userlist.txt.

Set permissions:

1
2
chown pgbouncer:pgbouncer /etc/pgbouncer/userlist.txt
chmod 640 /etc/pgbouncer/userlist.txt

2.6 Create the Systemd Service

PgBouncer must start after the IPsec tunnel is up. Create /etc/systemd/system/pgbouncer.service:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
[Unit]
Description=PgBouncer - PostgreSQL connection pooler (VPN proxy mode)
Documentation=man:pgbouncer(1)
After=network.target ipsec.service
Wants=ipsec.service

[Service]
Type=forking
User=pgbouncer
Group=pgbouncer
PIDFile=/var/run/pgbouncer/pgbouncer.pid

# Create runtime directory
RuntimeDirectory=pgbouncer
RuntimeDirectoryMode=0755
ExecStartPre=+/usr/bin/mkdir -p /var/run/pgbouncer
ExecStartPre=+/usr/bin/chown pgbouncer:pgbouncer /var/run/pgbouncer
ExecStartPre=+/usr/bin/chmod 0755 /var/run/pgbouncer

# PgBouncer configuration
ExecStart=/usr/bin/pgbouncer -d /etc/pgbouncer/pgbouncer.ini
ExecReload=/bin/kill -HUP $MAINPID

# Security settings
NoNewPrivileges=true
PrivateTmp=true
ProtectSystem=strict
ProtectHome=true
ReadWritePaths=/var/log/pgbouncer /var/run/pgbouncer

# Resource limits (allow many client connections)
LimitNOFILE=65536

# Restart on failure with a 5-second delay
Restart=on-failure
RestartSec=5s

[Install]
WantedBy=multi-user.target

Key details about this service file:

  • After=network.target ipsec.service ensures PgBouncer starts only after networking and the IPsec tunnel are ready.
  • Wants=ipsec.service creates a soft dependency – PgBouncer wants the IPsec service to be running but will not fail if it is not.
  • ProtectSystem=strict and ProtectHome=true are systemd security hardening directives that prevent PgBouncer from writing to system directories or reading home directories.
  • LimitNOFILE=65536 raises the file descriptor limit to support 500 client connections (each connection needs a file descriptor).

2.7 Start PgBouncer

1
2
systemctl daemon-reload
systemctl enable --now pgbouncer

2.8 Verify PgBouncer

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
# Check service status
systemctl status pgbouncer

# Connect to PgBouncer admin console
psql -h 127.0.0.1 -p 5432 -U pgbouncer_admin pgbouncer

# Inside the admin console, check pools and stats:
# SHOW POOLS;
# SHOW STATS;
# SHOW SERVERS;

# Test a database connection through the tunnel
psql -h 127.0.0.1 -p 5432 -U spin_admin -d statement_analyzer_uat -c "SELECT 1"

Step 3: Securing the VPN Gateway with Firewall Rules

The VPN gateway is exposed to the public internet, so locking down access is critical. We use firewalld with rich rules to allow only the specific traffic we need.

3.1 IPsec Tunnel Traffic

The IPsec tunnel requires three types of traffic from the Azure VPN Gateway:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
# IKE (Internet Key Exchange) - UDP port 500
# This is how the two gateways negotiate encryption parameters
firewall-cmd --permanent --add-rich-rule='rule family="ipv4" source address="42.170.70.119" port port="500" protocol="udp" accept'

# NAT-T (NAT Traversal) - UDP port 4500
# Used when either side is behind a NAT device. Always enabled for compatibility.
firewall-cmd --permanent --add-rich-rule='rule family="ipv4" source address="42.170.70.119" port port="4500" protocol="udp" accept'

# ESP (Encapsulating Security Payload) - IP protocol 50
# The actual encrypted tunnel traffic. This is not a port, it is an IP protocol.
firewall-cmd --permanent --add-rich-rule='rule family="ipv4" source address="42.170.70.119" protocol value="esp" accept'

If you have a backup Azure gateway, add the same rules for its IP:

1
2
3
firewall-cmd --permanent --add-rich-rule='rule family="ipv4" source address="24.99.201.54" port port="500" protocol="udp" accept'
firewall-cmd --permanent --add-rich-rule='rule family="ipv4" source address="24.99.201.54" port port="4500" protocol="udp" accept'
firewall-cmd --permanent --add-rich-rule='rule family="ipv4" source address="24.99.201.54" protocol value="esp" accept'

3.2 PgBouncer Access from Kubernetes Nodes

Only your Kubernetes cluster nodes should be able to reach PgBouncer. Add a rich rule for each node:

1
2
3
4
5
6
7
8
# Allow each k3s node to connect to PgBouncer on port 5432
firewall-cmd --permanent --add-rich-rule='rule family="ipv4" source address="116.203.151.218/32" port port="5432" protocol="tcp" accept'  # k8s-master
firewall-cmd --permanent --add-rich-rule='rule family="ipv4" source address="136.243.4.230/32" port port="5432" protocol="tcp" accept'    # k8s-beast-2
firewall-cmd --permanent --add-rich-rule='rule family="ipv4" source address="49.12.84.253/32" port port="5432" protocol="tcp" accept'     # k8s-beast-3
firewall-cmd --permanent --add-rich-rule='rule family="ipv4" source address="176.9.18.139/32" port port="5432" protocol="tcp" accept'     # k8s-beast-4
firewall-cmd --permanent --add-rich-rule='rule family="ipv4" source address="88.99.248.52/32" port port="5432" protocol="tcp" accept'      # k8s-beast-5
firewall-cmd --permanent --add-rich-rule='rule family="ipv4" source address="88.99.93.186/32" port port="5432" protocol="tcp" accept'      # k8s-beast-6
firewall-cmd --permanent --add-rich-rule='rule family="ipv4" source address="88.99.67.78/32" port port="5432" protocol="tcp" accept'       # k8s-robot-node-0

3.3 Monitoring Access

If you run Prometheus Node Exporter on the gateway for monitoring:

1
2
3
# Allow k3s nodes to scrape node_exporter metrics on port 9100
firewall-cmd --permanent --add-rich-rule='rule family="ipv4" source address="116.203.151.218/32" port port="9100" protocol="tcp" accept'
# ... repeat for each node

3.4 Apply the Rules

1
2
3
4
firewall-cmd --reload

# Verify
firewall-cmd --list-rich-rules

Step 4: Automated Tunnel Monitoring and Recovery

VPN tunnels can drop for many reasons: network blips, Azure gateway maintenance, DPD timeout, key renegotiation failures. An unmonitored tunnel means silent database outages. We deploy a monitoring script that runs every 2 minutes via cron, checks tunnel health, attempts automatic recovery, and alerts on failure.

4.1 The Monitoring Script

Create /usr/local/bin/ipsec-monitor.sh:

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
#!/bin/bash
# IPsec Tunnel Monitoring Script
# Monitors the VPN tunnel, restarts if down, alerts if unrecoverable
# Designed for Libreswan 5.x on Rocky Linux
# Runs via cron every 2 minutes

set -uo pipefail

LOCK_FILE="/var/run/ipsec-monitor.lock"
LOG_FILE="/var/log/ipsec-monitor.log"
WEBHOOK_URL_FILE="/etc/ipsec.d/webhook_url"
MAX_RESTART_ATTEMPTS=3
RESTART_WAIT=15
PRIMARY_CONN="citicorp-azure"
TUNNEL_TEST_HOST="statement-analyzer-uat-db.postgres.database.azure.com"

log() {
    echo "$(date '+%Y-%m-%d %H:%M:%S') [ipsec-monitor] $1" >> "${LOG_FILE}"
}

alert() {
    local message="$1"
    log "ALERT: ${message}"

    # Send webhook alert (Slack/Teams) if configured
    if [[ -f "${WEBHOOK_URL_FILE}" ]]; then
        local webhook_url
        webhook_url=$(cat "${WEBHOOK_URL_FILE}" 2>/dev/null | tr -d '[:space:]')
        if [[ -n "${webhook_url}" ]]; then
            local hostname
            hostname=$(hostname -f 2>/dev/null || hostname)
            local payload
            payload=$(printf '{"text":"[%s] %s"}' "${hostname}" "${message}")
            curl -sf -X POST -H 'Content-Type: application/json' \
                -d "${payload}" "${webhook_url}" \
                --max-time 10 2>/dev/null || log "WARNING: Failed to send webhook alert"
        fi
    fi
}

# Prevent concurrent runs
acquire_lock() {
    if [[ -f "${LOCK_FILE}" ]]; then
        local lock_pid
        lock_pid=$(cat "${LOCK_FILE}" 2>/dev/null || echo "")
        if [[ -n "${lock_pid}" ]] && kill -0 "${lock_pid}" 2>/dev/null; then
            log "Another instance is running (PID ${lock_pid}), exiting"
            exit 0
        fi
        log "Stale lock file found (PID ${lock_pid}), removing"
        rm -f "${LOCK_FILE}"
    fi
    echo $$ > "${LOCK_FILE}"
    trap 'rm -f "${LOCK_FILE}"' EXIT
}

# Check if IKE SA is established (Phase 1)
check_ike_established() {
    local conn="$1"
    ipsec status 2>/dev/null | grep -q \
        "\"${conn}\".*ESTABLISHED_IKE_SA\|\"${conn}\".*STATE_MAIN_[IR][34]"
}

# Check if IPsec SA is active (Phase 2 -- actual encrypted traffic)
check_ipsec_sa_active() {
    local conn="$1"
    ipsec status 2>/dev/null | grep -q \
        "\"${conn}\".*ESTABLISHED_CHILD_SA\|\"${conn}\".*STATE_QUICK_[IR]2"
}

# Ping test through tunnel
check_tunnel_reachability() {
    if [[ -z "${TUNNEL_TEST_HOST}" ]]; then
        return 0
    fi
    ping -c 2 -W 5 "${TUNNEL_TEST_HOST}" &>/dev/null
}

# Collect diagnostics for troubleshooting
get_diagnostics() {
    echo "--- ipsec status ---"
    ipsec status 2>/dev/null || echo "(ipsec status failed)"
    echo "--- ipsec trafficstatus ---"
    ipsec trafficstatus 2>/dev/null || echo "(no traffic)"
    echo "--- ip xfrm state ---"
    ip xfrm state 2>/dev/null || echo "(no xfrm state)"
    echo "--- end diagnostics ---"
}

bring_up_connection() {
    local conn="$1"
    log "Bringing up connection '${conn}'"
    ipsec up "${conn}" 2>&1 | while IFS= read -r line; do
        log "  ipsec up: ${line}"
    done
}

# --- Main ---

acquire_lock
touch "${LOG_FILE}"
chmod 640 "${LOG_FILE}"

# Check if ipsec service is running
if ! systemctl is-active --quiet ipsec 2>/dev/null; then
    log "WARNING: ipsec service is not running, starting it"
    systemctl start ipsec
    sleep "${RESTART_WAIT}"
    if ! systemctl is-active --quiet ipsec 2>/dev/null; then
        alert "IPsec service failed to start on VPN gateway. Manual intervention required."
        exit 1
    fi
fi

# Check primary tunnel
if check_ipsec_sa_active "${PRIMARY_CONN}"; then
    if check_tunnel_reachability; then
        log "OK: Tunnel '${PRIMARY_CONN}' is UP with active IPsec SA"
        exit 0
    else
        log "WARNING: Tunnel has SA but ping to ${TUNNEL_TEST_HOST} failed"
    fi
elif check_ike_established "${PRIMARY_CONN}"; then
    log "WARNING: IKE established but no IPsec SA (Phase 2 issue)"
else
    log "WARNING: Tunnel '${PRIMARY_CONN}' is DOWN (no IKE SA)"
fi

# --- Recovery ---
log "WARNING: No active tunnel detected, collecting diagnostics"
get_diagnostics >> "${LOG_FILE}" 2>&1

for attempt in $(seq 1 "${MAX_RESTART_ATTEMPTS}"); do
    log "Restart attempt ${attempt}/${MAX_RESTART_ATTEMPTS}"

    if [[ ${attempt} -le 2 ]]; then
        # Attempts 1-2: connection-level restart (less disruptive)
        ipsec down "${PRIMARY_CONN}" 2>/dev/null || true
        sleep 3
        bring_up_connection "${PRIMARY_CONN}"
        sleep "${RESTART_WAIT}"
    else
        # Attempt 3: full service restart (nuclear option)
        log "Attempting full service restart"
        systemctl restart ipsec
        sleep "${RESTART_WAIT}"
    fi

    if check_ipsec_sa_active "${PRIMARY_CONN}"; then
        log "OK: Tunnel re-established on attempt ${attempt}"
        alert "IPsec tunnel was DOWN but recovered automatically (attempt ${attempt})."
        exit 0
    fi
done

# All attempts failed
alert "IPsec tunnel is DOWN. ${MAX_RESTART_ATTEMPTS} restart attempts failed. Manual intervention required."
exit 1

Make it executable:

1
chmod 750 /usr/local/bin/ipsec-monitor.sh

4.2 Configure the Webhook for Alerting

Store your Slack (or Teams) webhook URL:

1
2
echo "https://hooks.slack.com/services/YOUR/WEBHOOK/URL" > /etc/ipsec.d/webhook_url
chmod 600 /etc/ipsec.d/webhook_url

4.3 Set Up the Cron Job

1
2
# Run monitoring every 2 minutes
echo "*/2 * * * * root /usr/local/bin/ipsec-monitor.sh" > /etc/cron.d/ipsec-monitor

4.4 Configure Log Rotation

Create /etc/logrotate.d/ipsec-monitor:

1
2
3
4
5
6
7
/var/log/ipsec-monitor.log {
    weekly
    rotate 4
    compress
    missingok
    notifempty
}

4.5 How the Recovery Logic Works

The monitoring script uses a graduated recovery approach:

  1. Attempts 1-2: Connection-level restart – Runs ipsec down followed by ipsec up for the specific connection. This is the least disruptive option and handles most transient failures (DPD timeout, key renegotiation failure).
  2. Attempt 3: Full service restart – Restarts the entire IPsec service (systemctl restart ipsec). This reinitializes the IKE daemon and is needed when the daemon itself is in a bad state.
  3. Alert on failure – If all 3 attempts fail, sends a webhook alert to Slack with the hostname and failure details. At this point, manual intervention is required (the Azure side may be down, or there is a network issue).

Step 5: Exposing the Database to Kubernetes with EndpointSlice

Now that the VPN tunnel is up and PgBouncer is running on the gateway, the final step is making the database accessible to Kubernetes pods through a stable, DNS-discoverable service.

5.1 Why EndpointSlice?

Kubernetes Services typically point to pods running inside the cluster. But our database proxy (PgBouncer) runs on an external server (the VPN gateway). To make this work, we create:

  1. A headless Service (without a pod selector) that defines the DNS name and port.
  2. An EndpointSlice that tells Kubernetes the actual IP address to route traffic to.

This combination gives pods a ClusterIP-like experience: they connect to citicorp-postgres.stage.svc.cluster.local:5432, and Kubernetes routes the traffic to the VPN gateway.

Why EndpointSlice instead of the older Endpoints resource? EndpointSlice is the modern replacement for Endpoints. It scales better (supports up to 100 endpoints per slice vs. a single large Endpoints object), supports dual-stack networking, and is the resource that kube-proxy actually watches in modern Kubernetes versions.

5.2 Create the Kubernetes Manifests

Create a file vpn-db-service.yaml:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
# Kubernetes Service + EndpointSlice for Azure PostgreSQL via VPN gateway
#
# Traffic flow:
#   k3s pod --> this Service --> vpn-gw:5432 (PgBouncer) --> IPsec tunnel --> Azure PostgreSQL

apiVersion: v1
kind: Service
metadata:
  name: citicorp-postgres
  namespace: stage
  labels:
    app: citicorp-postgres
    component: database-proxy
spec:
  ports:
    - name: postgresql
      port: 5432
      targetPort: 5432
      protocol: TCP
---
apiVersion: discovery.k8s.io/v1
kind: EndpointSlice
metadata:
  name: citicorp-postgres-1
  namespace: stage
  labels:
    app: citicorp-postgres
    component: database-proxy
    # This label is CRITICAL - it links the EndpointSlice to the Service.
    # Without it, the Service will not route traffic to the endpoints.
    kubernetes.io/service-name: citicorp-postgres
addressType: IPv4
ports:
  - name: postgresql
    port: 5432
    protocol: TCP
endpoints:
  - addresses:
      - 46.110.80.45 # vpn-gw public IP

5.3 Apply the Manifests

1
kubectl apply -f vpn-db-service.yaml

5.4 Verify the Setup

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
# Check the Service was created
kubectl get svc citicorp-postgres -n stage

# Check the EndpointSlice
kubectl get endpointslice -n stage -l kubernetes.io/service-name=citicorp-postgres

# Verify DNS resolution from within a pod
kubectl run dns-test --rm -it --image=busybox --namespace=stage -- \
  nslookup citicorp-postgres.stage.svc.cluster.local

# Test the full connection chain: pod -> Service -> VPN gateway -> IPsec -> Azure PostgreSQL
kubectl run pg-test --rm -it --image=postgres:15 --namespace=stage -- \
  psql -h citicorp-postgres -U spin_admin -d statement_analyzer_uat -c "SELECT 1"

5.5 Understanding the Key Label

The most important detail in the EndpointSlice is the label:

1
kubernetes.io/service-name: citicorp-postgres

This label is how Kubernetes associates an EndpointSlice with a Service. When kube-proxy sets up iptables/IPVS rules for the citicorp-postgres Service, it looks for EndpointSlices with this label. Without it, your Service will have no backends and connections will fail.

5.6 How Pods Connect

With this setup in place, application pods connect using standard PostgreSQL connection strings:

1
postgresql://spin_admin:[email protected]:5432/statement_analyzer_uat

Or via environment variables:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
env:
  - name: DATABASE_HOST
    value: "citicorp-postgres.stage.svc.cluster.local"
  - name: DATABASE_PORT
    value: "5432"
  - name: DATABASE_NAME
    value: "statement_analyzer_uat"
  - name: DATABASE_USER
    valueFrom:
      secretKeyRef:
        name: citicorp-postgres-creds
        key: username
  - name: DATABASE_PASSWORD
    valueFrom:
      secretKeyRef:
        name: citicorp-postgres-creds
        key: password

Step 6: Automating Everything with SaltStack

All of the above can be automated with SaltStack. Here is the structure of the Salt states used:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
salt/
├── salt-states/
   ├── ipsec/                          # IPsec VPN tunnel state
      ├── init.sls                    # Libreswan install, sysctl, config, service
      ├── citicorp.conf.j2            # IPsec connection template
      ├── citicorp.secrets.j2         # PSK template
      ├── ipsec-monitor.sh           # Monitoring script (Jinja template)
      └── ipsec-monitor-logrotate.conf
   
   ├── pgbouncer-vpn/                 # PgBouncer for VPN proxy mode
      ├── init.sls                   # Install, config, systemd service
      ├── pgbouncer.ini.j2           # PgBouncer config template
      ├── pgbouncer.service          # Systemd unit file
      └── userlist.txt.j2            # Credentials template
   
   ├── vpn-tools/                     # Diagnostic tools
      └── init.sls                   # psql, tcpdump, nmap, traceroute, etc.
   
   ├── firewall/                      # Firewall rules (includes vpn-gw section)
      └── init.sls
   
   ├── grains/                        # Role assignments
      └── init.sls                   # vpn-gw* hostname -> vpn-gw role
   
   └── top.sls                        # State assignments by role

└── pillar/
    └── common.sls                     # All VPN configuration values

6.1 The Top File (State Assignments)

The top.sls file assigns states to servers based on their grain role:

1
2
3
4
5
6
7
# VPN Gateway (role: vpn-gw)
"roles:vpn-gw":
  - match: grain
  - ipsec # IPsec VPN tunnel (Libreswan)
  - pgbouncer-vpn # PgBouncer proxying to Azure PostgreSQL
  - vpn-tools # Troubleshooting tools (psql, tcpdump, etc.)
  - node_exporter # Prometheus monitoring

6.2 Grain Assignment

Servers with hostnames matching vpn-gw* or vpn* are automatically assigned the vpn-gw role:

1
2
3
4
5
6
7
{% elif minion_id.startswith('vpn-gw') or minion_id.startswith('vpn') %}
set_vpn_gw_role:
  grains.present:
    - name: roles
    - value:
      - vpn-gw
    - force: True

6.3 Applying the States

To deploy the full VPN gateway stack:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
# Apply grain assignment first
salt 'vpn-gw' state.apply grains

# Apply all vpn-gw states
salt 'vpn-gw' state.apply

# Or apply individually for a more controlled rollout
salt 'vpn-gw' state.apply ipsec
salt 'vpn-gw' state.apply pgbouncer-vpn
salt 'vpn-gw' state.apply vpn-tools

Salt handles the dependency ordering automatically – PgBouncer will only start after the IPsec service is running, and the monitoring cron job is created after the monitoring script is deployed.

Troubleshooting

Tunnel Will Not Establish

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
# Check IPsec logs
journalctl -u ipsec -f

# Verify firewall allows IKE/NAT-T/ESP
firewall-cmd --list-rich-rules

# Check the PSK matches on both sides
cat /etc/ipsec.d/citicorp.secrets

# Verify the Azure VPN Gateway configuration matches your parameters
ipsec status

Common causes:

  • PSK mismatch between local and Azure sides
  • Firewall blocking UDP 500/4500 or ESP
  • IKE version mismatch (IKEv1 vs IKEv2)
  • Encryption algorithm mismatch

Tunnel Up but No Traffic

1
2
3
4
5
6
7
8
# Check kernel XFRM state (should show SA entries)
ip xfrm state

# Check XFRM policies (should show routing rules)
ip xfrm policy

# Verify routing
ip route get 10.70.0.25

Common causes:

Common causes:

  • leftsubnet / rightsubnet mismatch with Azure traffic selectors
  • Missing IP forwarding (sysctl net.ipv4.ip_forward)
  • Reverse path filtering dropping decapsulated packets

PgBouncer Cannot Connect to Azure PostgreSQL

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
# Verify tunnel is up first
ipsec trafficstatus

# Test direct connection to Azure PG through the tunnel
psql -h 10.70.0.25 -p 5432 -U spin_admin -d statement_analyzer_uat

# Check PgBouncer logs
journalctl -u pgbouncer -f
tail -f /var/log/pgbouncer/pgbouncer.log

# Verify PgBouncer config
cat /etc/pgbouncer/pgbouncer.ini

Pods Cannot Reach the VPN Gateway

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
# Verify the Service and EndpointSlice exist
kubectl get svc citicorp-postgres -n stage
kubectl get endpointslice -n stage -l kubernetes.io/service-name=citicorp-postgres

# Check DNS resolution
kubectl run dns-test --rm -it --image=busybox --namespace=stage -- \
  nslookup citicorp-postgres

# Verify firewall on vpn-gw allows the k3s node IP
firewall-cmd --list-rich-rules | grep 5432

# Test TCP connectivity from a k3s node
nc -zv 46.110.80.45 5432

Security Considerations

  1. Pre-shared key strength: Use a randomly generated PSK of at least 32 characters. Rotate it periodically and coordinate the rotation with the Azure VPN Gateway administrator.

  2. DH Group: This setup uses DH Group 14 (2048-bit MODP) for key exchange. DH Group 2 (1024-bit) is deprecated by NIST and should be avoided. If the Azure side requires Group 2, flag this as a security improvement request.

  3. Firewall principle of least privilege: Only the specific IPs that need access (Azure gateway IPs for IPsec, k3s node IPs for PgBouncer) are allowed. No ports are open to the general internet.

  4. PgBouncer credentials: Store credentials in a secrets manager (e.g., HashiCorp Vault) and inject them into the pillar or userlist.txt. Avoid storing plaintext passwords in version control.

  5. Systemd hardening: PgBouncer runs with ProtectSystem=strict, ProtectHome=true, NoNewPrivileges=true, and PrivateTmp=true. These limit what a compromised PgBouncer process could do.

Conclusion

This architecture creates a secure, reliable, and maintainable bridge between Kubernetes workloads on Hetzner Cloud and an Azure PostgreSQL database behind a private endpoint. The key components work together:

  • Libreswan IPsec provides the encrypted network tunnel to Azure
  • PgBouncer handles connection pooling, reducing load on the database and shielding pods from VPN latency
  • Kubernetes EndpointSlice gives pods a clean, DNS-discoverable database endpoint
  • Automated monitoring detects tunnel failures and attempts recovery before alerting
  • SaltStack automates the entire deployment, ensuring consistency and repeatability

Each layer adds a specific value: the tunnel provides security, PgBouncer provides performance, EndpointSlice provides discoverability, monitoring provides reliability, and Salt provides automation. Together, they turn a complex cross-cloud database connectivity problem into a standard postgresql://host:port/db connection string for application developers.

comments powered by Disqus
Citizix Ltd
Built with Hugo
Theme Stack designed by Jimmy