arvandor/docs/architecture.md
2026-01-26 00:44:31 -05:00

4.5 KiB

Architecture

This document explains the design decisions behind Arvandor.

Network Separation

Why Two Networks?

Internet ──► Proxmox Host ──► vmbr1 (192.168.100.0/24)
                  │
                  └──► Nebula (10.10.10.0/24)

Bridge Network (vmbr1)

  • Used only for Terraform provisioning and Ansible access
  • VMs firewall blocks all bridge traffic except from Proxmox host
  • No inter-VM communication on this network

Nebula Overlay

  • All application traffic uses encrypted Nebula tunnels
  • Group-based firewall rules for segmentation
  • Works across any network boundary (cloud, datacenter, home)

Benefits

  1. Defense in depth - Compromise of bridge network doesn't expose services
  2. Migration ready - Move VMs anywhere, Nebula handles connectivity
  3. Zero-trust - VMs authenticate via certificates, not network position

VMID Allocation

VMIDs follow a logical pattern:

Range Purpose Example
1000-1999 Management DNS, Caddy
2000-2999 Services Vault, Gitea
3000-3999 Data PostgreSQL, Valkey
4000-4999 Workloads Applications
5000-5999 Monitoring Prometheus

The last digits determine the IP address:

  • VMID 1001 → x.x.x.11
  • VMID 3000 → x.x.x.30

High Availability

All data services run as 3-node clusters:

PostgreSQL (Patroni + etcd)

┌─────────────┐  ┌─────────────┐  ┌─────────────┐
│ postgres-01 │  │ postgres-02 │  │ postgres-03 │
│   Leader    │◄─│  Replica    │◄─│  Replica    │
│  + etcd     │  │  + etcd     │  │  + etcd     │
└─────────────┘  └─────────────┘  └─────────────┘
  • Patroni handles leader election
  • etcd provides distributed consensus
  • Automatic failover on leader failure

Valkey (Sentinel)

┌─────────────┐  ┌─────────────┐  ┌─────────────┐
│  valkey-01  │  │  valkey-02  │  │  valkey-03  │
│   Master    │──│  Replica    │  │  Replica    │
│ + Sentinel  │  │ + Sentinel  │  │ + Sentinel  │
└─────────────┘  └─────────────┘  └─────────────┘
  • Sentinel monitors master health
  • Automatic promotion on master failure
  • ACL-based per-service key isolation

Vault (Raft)

┌─────────────┐  ┌─────────────┐  ┌─────────────┐
│  vault-01   │  │  vault-02   │  │  vault-03   │
│   Leader    │──│  Standby    │──│  Standby    │
└─────────────┘  └─────────────┘  └─────────────┘
  • Integrated Raft storage (no external backend)
  • Automatic leader election
  • Unseal required after restart

Security Model

Three-Layer Firewall

┌─────────────────────────────────────────────────────────────┐
│  1. Proxmox VM Firewall  →  Egress control                  │
│  2. Nebula Groups        →  East-west segmentation          │
│  3. Guest iptables       →  Defense in depth                │
└─────────────────────────────────────────────────────────────┘

Nebula Groups

Group Can Access
admin Everything
infrastructure infrastructure
projects infrastructure
games Nothing (isolated)

Vault Integration

Applications use Vault for:

  • Dynamic database credentials (short-lived)
  • Service secrets (API keys, etc.)
  • AppRole authentication

Service Discovery

Internal DNS provides hostname resolution:

<hostname>.nebula  →  Nebula IP

VMs query 10.10.10.11 (DNS server) via Nebula. External queries forward to Cloudflare (1.1.1.1).

Provisioning Flow

1. terraform apply     →  Create VM
2. bootstrap.yml       →  Update packages
3. security.yml        →  Configure firewall
4. nebula.yml          →  Join overlay network
5. <service>.yml       →  Deploy service
6. data-service.yml    →  Provision credentials