arvandor/docs/architecture.md
2026-01-26 00:44:31 -05:00

140 lines
4.5 KiB
Markdown

# Architecture
This document explains the design decisions behind Arvandor.
## Network Separation
### Why Two Networks?
```
Internet ──► Proxmox Host ──► vmbr1 (192.168.100.0/24)
└──► Nebula (10.10.10.0/24)
```
**Bridge Network (vmbr1)**
- Used only for Terraform provisioning and Ansible access
- VMs firewall blocks all bridge traffic except from Proxmox host
- No inter-VM communication on this network
**Nebula Overlay**
- All application traffic uses encrypted Nebula tunnels
- Group-based firewall rules for segmentation
- Works across any network boundary (cloud, datacenter, home)
### Benefits
1. **Defense in depth** - Compromise of bridge network doesn't expose services
2. **Migration ready** - Move VMs anywhere, Nebula handles connectivity
3. **Zero-trust** - VMs authenticate via certificates, not network position
## VMID Allocation
VMIDs follow a logical pattern:
| Range | Purpose | Example |
|-------|---------|---------|
| 1000-1999 | Management | DNS, Caddy |
| 2000-2999 | Services | Vault, Gitea |
| 3000-3999 | Data | PostgreSQL, Valkey |
| 4000-4999 | Workloads | Applications |
| 5000-5999 | Monitoring | Prometheus |
The last digits determine the IP address:
- VMID 1001 → x.x.x.11
- VMID 3000 → x.x.x.30
## High Availability
All data services run as 3-node clusters:
### PostgreSQL (Patroni + etcd)
```
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ postgres-01 │ │ postgres-02 │ │ postgres-03 │
│ Leader │◄─│ Replica │◄─│ Replica │
│ + etcd │ │ + etcd │ │ + etcd │
└─────────────┘ └─────────────┘ └─────────────┘
```
- Patroni handles leader election
- etcd provides distributed consensus
- Automatic failover on leader failure
### Valkey (Sentinel)
```
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ valkey-01 │ │ valkey-02 │ │ valkey-03 │
│ Master │──│ Replica │ │ Replica │
│ + Sentinel │ │ + Sentinel │ │ + Sentinel │
└─────────────┘ └─────────────┘ └─────────────┘
```
- Sentinel monitors master health
- Automatic promotion on master failure
- ACL-based per-service key isolation
### Vault (Raft)
```
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ vault-01 │ │ vault-02 │ │ vault-03 │
│ Leader │──│ Standby │──│ Standby │
└─────────────┘ └─────────────┘ └─────────────┘
```
- Integrated Raft storage (no external backend)
- Automatic leader election
- Unseal required after restart
## Security Model
### Three-Layer Firewall
```
┌─────────────────────────────────────────────────────────────┐
│ 1. Proxmox VM Firewall → Egress control │
│ 2. Nebula Groups → East-west segmentation │
│ 3. Guest iptables → Defense in depth │
└─────────────────────────────────────────────────────────────┘
```
### Nebula Groups
| Group | Can Access |
|-------|------------|
| admin | Everything |
| infrastructure | infrastructure |
| projects | infrastructure |
| games | Nothing (isolated) |
### Vault Integration
Applications use Vault for:
- Dynamic database credentials (short-lived)
- Service secrets (API keys, etc.)
- AppRole authentication
## Service Discovery
Internal DNS provides hostname resolution:
```
<hostname>.nebula → Nebula IP
```
VMs query 10.10.10.11 (DNS server) via Nebula. External queries forward to Cloudflare (1.1.1.1).
## Provisioning Flow
```
1. terraform apply → Create VM
2. bootstrap.yml → Update packages
3. security.yml → Configure firewall
4. nebula.yml → Join overlay network
5. <service>.yml → Deploy service
6. data-service.yml → Provision credentials
```