nexus-5-auth/PRODUCTION-DEPLOYMENT.md
2026-01-26 11:15:52 -05:00

598 lines
14 KiB
Markdown

# Production Deployment Checklist
## Architecture Overview
**Production Domains:**
- Frontend: `https://account.example.com`
- Oathkeeper Proxy: `https://auth.example.com` (port 4455)
- Django API: `https://api.example.com`
- Kratos: Internal only (ports 4433/4434)
- Oathkeeper API: Internal only (port 4456)
**All services run on the same VM**, so internal communication uses localhost/docker network.
---
## Pre-Deployment Checklist
### 1. Security Hardening
#### Kratos Secrets
```bash
# Generate new secrets for production
openssl rand -hex 16 # SECRETS_DEFAULT
openssl rand -hex 16 # SECRETS_COOKIE
openssl rand -hex 16 # SECRETS_CIPHER
```
Update in `nexus-5-auth-kratos/.env.production`:
- [ ] `SECRETS_DEFAULT` - New random value
- [ ] `SECRETS_COOKIE` - New random value
- [ ] `SECRETS_CIPHER` - New random value
#### Database Passwords
- [ ] Change `POSTGRES_PASSWORD` in `nexus-5-auth-kratos/.env.production`
- [ ] Update `KRATOS_DSN` with the new URL-encoded password
#### SMTP Configuration
- [ ] Verify SMTP credentials in `nexus-5-auth-kratos/config/kratos.yml` (line 128)
- [ ] Consider using environment variable instead of hardcoded value
### 2. SSL/TLS Configuration
#### Oathkeeper (https://auth.example.com)
- [ ] Configure reverse proxy (nginx/caddy) for SSL termination
- [ ] Install SSL certificate for `auth.example.com`
- [ ] Configure proxy to forward to `localhost:4455`
#### Frontend (https://account.example.com)
- [ ] Configure reverse proxy for SSL termination
- [ ] Install SSL certificate for `account.example.com`
- [ ] Configure proxy to forward to SvelteKit server (typically port 3000 or 5173)
#### CORS Configuration
Verify Oathkeeper CORS is configured (`nexus-5-auth-oathkeeper/config/oathkeeper.yml`):
- [x] `https://account.example.com` in allowed_origins
- [x] `https://auth.example.com` in allowed_origins
- [x] `https://api.example.com` in allowed_origins
- [x] `allow_credentials: true`
### 3. Environment Files
#### Replace .env files with production versions:
```bash
# Kratos
cp nexus-5-auth-kratos/.env.production nexus-5-auth-kratos/.env
# Oathkeeper
cp nexus-5-auth-oathkeeper/.env.production nexus-5-auth-oathkeeper/.env
# Frontend
cp nexus-5-auth-frontend/.env.production nexus-5-auth-frontend/.env
```
#### Verify all environment variables:
- [ ] `nexus-5-auth-kratos/.env`
- [ ] `nexus-5-auth-oathkeeper/.env`
- [ ] `nexus-5-auth-frontend/.env`
---
## Deployment Steps
### 1. Database Setup
```bash
cd nexus-5-auth-kratos
# Start PostgreSQL
docker compose up -d postgres
# Wait for PostgreSQL to be ready
docker compose logs -f postgres
# Wait for "database system is ready to accept connections"
# Run Kratos migrations
docker compose run --rm kratos migrate sql -e --yes
```
### 2. Deploy Kratos
```bash
cd nexus-5-auth-kratos
# Build and start Kratos
docker compose up -d kratos
# Verify it's running
docker compose ps
docker compose logs kratos
# Test health endpoint
curl http://localhost:4433/health/ready
```
**Expected response:**
```json
{"status": "ok"}
```
### 3. Deploy Oathkeeper
```bash
cd nexus-5-auth-oathkeeper
# Rebuild with updated config
docker compose build oathkeeper
# Start Oathkeeper
docker compose up -d oathkeeper
# Verify it's running
docker compose ps
docker compose logs oathkeeper
# Test health endpoint
curl http://localhost:4456/health/ready
```
**Expected response:**
```json
{"status": "ok"}
```
### 4. Test Access Rules
```bash
# List all configured rules
curl http://localhost:4456/rules | jq .
# Verify rule count (should be 9 rules)
curl -s http://localhost:4456/rules | jq 'length'
```
**Expected rules:**
1. `kratos:self-service`
2. `kratos:admin:identities`
3. `kratos:admin:recovery`
4. `kratos:admin:courier`
5. `kratos:admin:sessions`
6. `kratos:sessions:api`
7. `django:api:public`
8. `django:api:protected`
9. `django:api:v1`
### 5. Deploy Frontend
#### Option A: Docker Deployment (Recommended)
```bash
cd nexus-5-auth-frontend
# Copy production environment
cp .env.production .env
# Ensure ory-network exists
docker network create ory-network 2>/dev/null || true
# Build and start
docker compose up -d
# Verify it's running
docker compose ps
docker compose logs frontend
# Test health endpoint
curl http://localhost:3000/
```
**Expected response:** HTML page content
#### Option B: PM2 Deployment
```bash
cd nexus-5-auth-frontend
# Install dependencies
npm install
# Copy production environment
cp .env.production .env
# Build for production
npm run build
# Deploy with PM2
pm2 start npm --name "nexus-auth-frontend" -- start
pm2 save
```
#### Option C: Direct Node Deployment
```bash
cd nexus-5-auth-frontend
# Install dependencies
npm install
# Copy production environment
cp .env.production .env
# Build for production
npm run build
# Start with node
node build
```
### 6. Configure Reverse Proxy
#### Example Nginx Configuration
**File: `/etc/nginx/sites-available/auth.example.com`**
```nginx
server {
listen 443 ssl http2;
server_name auth.example.com;
ssl_certificate /path/to/ssl/cert.pem;
ssl_certificate_key /path/to/ssl/key.pem;
location / {
proxy_pass http://localhost:4455;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
# WebSocket support (if needed)
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
}
}
```
**File: `/etc/nginx/sites-available/account.example.com`**
```nginx
server {
listen 443 ssl http2;
server_name account.example.com;
ssl_certificate /path/to/ssl/cert.pem;
ssl_certificate_key /path/to/ssl/key.pem;
location / {
proxy_pass http://localhost:3000; # Adjust to your SvelteKit port
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
# WebSocket support for HMR (disable in production)
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
}
}
```
Enable sites and reload nginx:
```bash
sudo ln -s /etc/nginx/sites-available/auth.example.com /etc/nginx/sites-enabled/
sudo ln -s /etc/nginx/sites-available/account.example.com /etc/nginx/sites-enabled/
sudo nginx -t
sudo systemctl reload nginx
```
---
## Post-Deployment Testing
### 1. Health Checks
```bash
# Kratos public API
curl https://auth.example.com/health/ready
# Kratos admin API (through Oathkeeper - requires auth)
curl https://auth.example.com/admin/identities
# Oathkeeper API (internal)
curl http://localhost:4456/health/ready
```
### 2. Frontend Testing
Visit `https://account.example.com` and test:
- [ ] Registration flow
- [ ] Login flow
- [ ] Email verification
- [ ] Password recovery
- [ ] Settings page
- [ ] Admin dashboard (identity management)
- [ ] Session management
- [ ] Logout
### 3. WebAuthn Testing
- [ ] Register with passkey/security key
- [ ] Login with passkey/security key
- [ ] TOTP (authenticator app) setup
- [ ] TOTP login
### 4. API Testing
Test Django integration (once you have an authenticated session):
```bash
# Public API (no auth)
curl https://auth.example.com/api/public/
# Protected API (with session cookie)
curl -b cookies.txt https://auth.example.com/api/protected/
# Bearer token API
curl -H "Authorization: Bearer YOUR_TOKEN" https://auth.example.com/api/v1/
```
### 5. Verify Headers Forwarded to Django
Create a test identity and check headers received by Django:
**Expected headers from Oathkeeper:**
```
X-User-ID: <kratos_identity_id>
X-User-Email: user@example.com
X-User-First-Name: John
X-User-Last-Name: Doe
X-User-Phone: +1234567890
X-User-Profile-Type: team|customer
X-Django-Profile-ID: <uuid>
X-Customer-ID: <uuid>
```
---
## Django Backend Integration
### 1. Update Django Settings
Add trusted headers and CORS configuration:
```python
# settings.py
# Oathkeeper proxy headers
USE_X_FORWARDED_HOST = True
USE_X_FORWARDED_PORT = True
SECURE_PROXY_SSL_HEADER = ('HTTP_X_FORWARDED_PROTO', 'https')
# CORS settings
CORS_ALLOWED_ORIGINS = [
"https://account.example.com",
"https://auth.example.com",
]
CORS_ALLOW_CREDENTIALS = True
# Session/Cookie settings
SESSION_COOKIE_DOMAIN = '.example.com'
CSRF_COOKIE_DOMAIN = '.example.com'
SESSION_COOKIE_SECURE = True
CSRF_COOKIE_SECURE = True
SESSION_COOKIE_SAMESITE = 'Lax'
CSRF_COOKIE_SAMESITE = 'Lax'
```
### 2. Create Authentication Middleware
```python
# middleware/kratos_auth.py
class KratosAuthMiddleware:
def __init__(self, get_response):
self.get_response = get_response
def __call__(self, request):
# Extract Kratos identity from headers
user_id = request.META.get('HTTP_X_USER_ID')
user_email = request.META.get('HTTP_X_USER_EMAIL')
first_name = request.META.get('HTTP_X_USER_FIRST_NAME')
last_name = request.META.get('HTTP_X_USER_LAST_NAME')
phone = request.META.get('HTTP_X_USER_PHONE')
profile_type = request.META.get('HTTP_X_USER_PROFILE_TYPE')
django_profile_id = request.META.get('HTTP_X_DJANGO_PROFILE_ID')
customer_id = request.META.get('HTTP_X_CUSTOMER_ID')
if user_id and user_email:
# Look up or create user based on Kratos identity
# Attach to request.user or request.kratos_user
pass
response = self.get_response(request)
return response
```
Add to `MIDDLEWARE` in settings.py:
```python
MIDDLEWARE = [
# ... other middleware
'your_app.middleware.kratos_auth.KratosAuthMiddleware',
]
```
### 3. Sync Identity on Registration
When a user registers in Kratos, sync to Django:
**Option A: Webhook (recommended)**
- Configure Kratos webhook to call Django API on identity creation
- Django creates corresponding TeamProfile/CustomerProfile
- Returns django_profile_id to be stored in Kratos metadata_public
**Option B: Poll/Manual Sync**
- Periodic task to sync new Kratos identities to Django
- Less real-time but simpler to implement
---
## Monitoring & Logging
### 1. Log Aggregation
Collect logs from all services:
```bash
# Kratos logs
docker compose -f nexus-5-auth-kratos/docker-compose.yml logs -f kratos
# Oathkeeper logs
docker compose -f nexus-5-auth-oathkeeper/docker-compose.yml logs -f oathkeeper
# Frontend logs (if using PM2)
pm2 logs nexus-auth-frontend
```
### 2. Metrics to Monitor
- [ ] Kratos health endpoint: `GET /health/ready`
- [ ] Oathkeeper health endpoint: `GET /health/ready`
- [ ] Database connection pool usage
- [ ] Session count
- [ ] Identity count
- [ ] Failed login attempts
- [ ] Email delivery failures
### 3. Set Log Levels
**Production log levels:**
- Kratos: `LOG_LEVEL=info`
- Oathkeeper: `log.level=info`
- Frontend: Configure in SvelteKit
---
## Backup & Recovery
### 1. Database Backups
```bash
# Backup Kratos database
docker compose -f nexus-5-auth-kratos/docker-compose.yml exec postgres \
pg_dump -U kratos kratos > kratos-backup-$(date +%Y%m%d).sql
# Restore
docker compose -f nexus-5-auth-kratos/docker-compose.yml exec -T postgres \
psql -U kratos kratos < kratos-backup-20251014.sql
```
### 2. Configuration Backups
- [ ] Backup `nexus-5-auth-kratos/config/`
- [ ] Backup `nexus-5-auth-oathkeeper/config/`
- [ ] Backup `.env` files (encrypted storage!)
- [ ] Backup JWKS keys: `nexus-5-auth-oathkeeper/config/id_token.jwks.json`
---
## Rollback Plan
If issues occur in production:
### 1. Quick Rollback
```bash
# Stop services
docker compose down
# Revert to previous .env
git checkout HEAD~1 nexus-5-auth-*/
# Restart with old config
docker compose up -d
```
### 2. Database Rollback
```bash
# Restore from backup
docker compose exec -T postgres psql -U kratos kratos < kratos-backup-YYYYMMDD.sql
```
---
## Security Checklist
- [ ] All secrets rotated for production
- [ ] SSL certificates installed and valid
- [ ] HTTPS enforced on all domains
- [ ] Database passwords strong and unique
- [ ] SMTP credentials secured
- [ ] Cookie domain set to `.example.com`
- [ ] Session cookies marked as Secure
- [ ] CORS properly configured
- [ ] Admin API requires authentication
- [ ] Rate limiting configured (if needed)
- [ ] Firewall rules: Only 443/80 exposed publicly
- [ ] Internal ports (4433, 4434, 4456, 5432) blocked from external access
---
## Support & Troubleshooting
### Common Issues
**Issue: "Cookie not being set"**
- Check `session.cookie.domain` in kratos.yml is `example.com`
- Verify HTTPS is working
- Check browser dev tools > Application > Cookies
**Issue: "CORS errors"**
- Verify Oathkeeper CORS config includes all domains
- Check `allow_credentials: true`
- Verify Origin header matches allowed_origins
**Issue: "Redirect loop"**
- Check `preserve_host` settings in access rules
- Verify Kratos `allowed_return_urls` includes production domains
**Issue: "WebAuthn not working"**
- Verify `webauthn.config.rp.id` is `example.com`
- Check `webauthn.config.rp.origins` includes production URLs
- Ensure HTTPS is working (WebAuthn requires secure context)
### Debug Commands
```bash
# Check Oathkeeper rules
curl http://localhost:4456/rules | jq .
# Check Kratos sessions
curl -H "Cookie: ory_kratos_session=..." http://localhost:4433/sessions/whoami
# Test Oathkeeper decision API
curl -H "Cookie: ory_kratos_session=..." http://localhost:4455/decisions/admin/identities
# View Kratos configuration
docker compose exec kratos cat /etc/kratos/kratos.yml
```
---
## Production Deployment Complete! 🎉
Once all checklist items are complete, your Nexus 5 Auth system is production-ready with:
✅ Ory Kratos for identity management
✅ Ory Oathkeeper for authentication & authorization
✅ SvelteKit frontend with admin dashboard
✅ Full Django integration with custom headers
✅ Secure session management across subdomains
✅ WebAuthn/TOTP support
✅ Email verification & recovery
✅ Complete API endpoint coverage
**Next Steps:**
1. Monitor logs for the first 24 hours
2. Test all user flows in production
3. Set up automated backups
4. Configure monitoring/alerting
5. Document any environment-specific configurations