nexus-5-auth/PRODUCTION-DEPLOYMENT.md
2026-01-26 11:15:52 -05:00

14 KiB

Production Deployment Checklist

Architecture Overview

Production Domains:

  • Frontend: https://account.example.com
  • Oathkeeper Proxy: https://auth.example.com (port 4455)
  • Django API: https://api.example.com
  • Kratos: Internal only (ports 4433/4434)
  • Oathkeeper API: Internal only (port 4456)

All services run on the same VM, so internal communication uses localhost/docker network.


Pre-Deployment Checklist

1. Security Hardening

Kratos Secrets

# Generate new secrets for production
openssl rand -hex 16  # SECRETS_DEFAULT
openssl rand -hex 16  # SECRETS_COOKIE
openssl rand -hex 16  # SECRETS_CIPHER

Update in nexus-5-auth-kratos/.env.production:

  • SECRETS_DEFAULT - New random value
  • SECRETS_COOKIE - New random value
  • SECRETS_CIPHER - New random value

Database Passwords

  • Change POSTGRES_PASSWORD in nexus-5-auth-kratos/.env.production
  • Update KRATOS_DSN with the new URL-encoded password

SMTP Configuration

  • Verify SMTP credentials in nexus-5-auth-kratos/config/kratos.yml (line 128)
  • Consider using environment variable instead of hardcoded value

2. SSL/TLS Configuration

Oathkeeper (https://auth.example.com)

  • Configure reverse proxy (nginx/caddy) for SSL termination
  • Install SSL certificate for auth.example.com
  • Configure proxy to forward to localhost:4455

Frontend (https://account.example.com)

  • Configure reverse proxy for SSL termination
  • Install SSL certificate for account.example.com
  • Configure proxy to forward to SvelteKit server (typically port 3000 or 5173)

CORS Configuration

Verify Oathkeeper CORS is configured (nexus-5-auth-oathkeeper/config/oathkeeper.yml):

  • https://account.example.com in allowed_origins
  • https://auth.example.com in allowed_origins
  • https://api.example.com in allowed_origins
  • allow_credentials: true

3. Environment Files

Replace .env files with production versions:

# Kratos
cp nexus-5-auth-kratos/.env.production nexus-5-auth-kratos/.env

# Oathkeeper
cp nexus-5-auth-oathkeeper/.env.production nexus-5-auth-oathkeeper/.env

# Frontend
cp nexus-5-auth-frontend/.env.production nexus-5-auth-frontend/.env

Verify all environment variables:

  • nexus-5-auth-kratos/.env
  • nexus-5-auth-oathkeeper/.env
  • nexus-5-auth-frontend/.env

Deployment Steps

1. Database Setup

cd nexus-5-auth-kratos

# Start PostgreSQL
docker compose up -d postgres

# Wait for PostgreSQL to be ready
docker compose logs -f postgres
# Wait for "database system is ready to accept connections"

# Run Kratos migrations
docker compose run --rm kratos migrate sql -e --yes

2. Deploy Kratos

cd nexus-5-auth-kratos

# Build and start Kratos
docker compose up -d kratos

# Verify it's running
docker compose ps
docker compose logs kratos

# Test health endpoint
curl http://localhost:4433/health/ready

Expected response:

{"status": "ok"}

3. Deploy Oathkeeper

cd nexus-5-auth-oathkeeper

# Rebuild with updated config
docker compose build oathkeeper

# Start Oathkeeper
docker compose up -d oathkeeper

# Verify it's running
docker compose ps
docker compose logs oathkeeper

# Test health endpoint
curl http://localhost:4456/health/ready

Expected response:

{"status": "ok"}

4. Test Access Rules

# List all configured rules
curl http://localhost:4456/rules | jq .

# Verify rule count (should be 9 rules)
curl -s http://localhost:4456/rules | jq 'length'

Expected rules:

  1. kratos:self-service
  2. kratos:admin:identities
  3. kratos:admin:recovery
  4. kratos:admin:courier
  5. kratos:admin:sessions
  6. kratos:sessions:api
  7. django:api:public
  8. django:api:protected
  9. django:api:v1

5. Deploy Frontend

cd nexus-5-auth-frontend

# Copy production environment
cp .env.production .env

# Ensure ory-network exists
docker network create ory-network 2>/dev/null || true

# Build and start
docker compose up -d

# Verify it's running
docker compose ps
docker compose logs frontend

# Test health endpoint
curl http://localhost:3000/

Expected response: HTML page content

Option B: PM2 Deployment

cd nexus-5-auth-frontend

# Install dependencies
npm install

# Copy production environment
cp .env.production .env

# Build for production
npm run build

# Deploy with PM2
pm2 start npm --name "nexus-auth-frontend" -- start
pm2 save

Option C: Direct Node Deployment

cd nexus-5-auth-frontend

# Install dependencies
npm install

# Copy production environment
cp .env.production .env

# Build for production
npm run build

# Start with node
node build

6. Configure Reverse Proxy

Example Nginx Configuration

File: /etc/nginx/sites-available/auth.example.com

server {
    listen 443 ssl http2;
    server_name auth.example.com;

    ssl_certificate /path/to/ssl/cert.pem;
    ssl_certificate_key /path/to/ssl/key.pem;

    location / {
        proxy_pass http://localhost:4455;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;

        # WebSocket support (if needed)
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "upgrade";
    }
}

File: /etc/nginx/sites-available/account.example.com

server {
    listen 443 ssl http2;
    server_name account.example.com;

    ssl_certificate /path/to/ssl/cert.pem;
    ssl_certificate_key /path/to/ssl/key.pem;

    location / {
        proxy_pass http://localhost:3000;  # Adjust to your SvelteKit port
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;

        # WebSocket support for HMR (disable in production)
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "upgrade";
    }
}

Enable sites and reload nginx:

sudo ln -s /etc/nginx/sites-available/auth.example.com /etc/nginx/sites-enabled/
sudo ln -s /etc/nginx/sites-available/account.example.com /etc/nginx/sites-enabled/
sudo nginx -t
sudo systemctl reload nginx

Post-Deployment Testing

1. Health Checks

# Kratos public API
curl https://auth.example.com/health/ready

# Kratos admin API (through Oathkeeper - requires auth)
curl https://auth.example.com/admin/identities

# Oathkeeper API (internal)
curl http://localhost:4456/health/ready

2. Frontend Testing

Visit https://account.example.com and test:

  • Registration flow
  • Login flow
  • Email verification
  • Password recovery
  • Settings page
  • Admin dashboard (identity management)
  • Session management
  • Logout

3. WebAuthn Testing

  • Register with passkey/security key
  • Login with passkey/security key
  • TOTP (authenticator app) setup
  • TOTP login

4. API Testing

Test Django integration (once you have an authenticated session):

# Public API (no auth)
curl https://auth.example.com/api/public/

# Protected API (with session cookie)
curl -b cookies.txt https://auth.example.com/api/protected/

# Bearer token API
curl -H "Authorization: Bearer YOUR_TOKEN" https://auth.example.com/api/v1/

5. Verify Headers Forwarded to Django

Create a test identity and check headers received by Django:

Expected headers from Oathkeeper:

X-User-ID: <kratos_identity_id>
X-User-Email: user@example.com
X-User-First-Name: John
X-User-Last-Name: Doe
X-User-Phone: +1234567890
X-User-Profile-Type: team|customer
X-Django-Profile-ID: <uuid>
X-Customer-ID: <uuid>

Django Backend Integration

1. Update Django Settings

Add trusted headers and CORS configuration:

# settings.py

# Oathkeeper proxy headers
USE_X_FORWARDED_HOST = True
USE_X_FORWARDED_PORT = True
SECURE_PROXY_SSL_HEADER = ('HTTP_X_FORWARDED_PROTO', 'https')

# CORS settings
CORS_ALLOWED_ORIGINS = [
    "https://account.example.com",
    "https://auth.example.com",
]
CORS_ALLOW_CREDENTIALS = True

# Session/Cookie settings
SESSION_COOKIE_DOMAIN = '.example.com'
CSRF_COOKIE_DOMAIN = '.example.com'
SESSION_COOKIE_SECURE = True
CSRF_COOKIE_SECURE = True
SESSION_COOKIE_SAMESITE = 'Lax'
CSRF_COOKIE_SAMESITE = 'Lax'

2. Create Authentication Middleware

# middleware/kratos_auth.py

class KratosAuthMiddleware:
    def __init__(self, get_response):
        self.get_response = get_response

    def __call__(self, request):
        # Extract Kratos identity from headers
        user_id = request.META.get('HTTP_X_USER_ID')
        user_email = request.META.get('HTTP_X_USER_EMAIL')
        first_name = request.META.get('HTTP_X_USER_FIRST_NAME')
        last_name = request.META.get('HTTP_X_USER_LAST_NAME')
        phone = request.META.get('HTTP_X_USER_PHONE')
        profile_type = request.META.get('HTTP_X_USER_PROFILE_TYPE')
        django_profile_id = request.META.get('HTTP_X_DJANGO_PROFILE_ID')
        customer_id = request.META.get('HTTP_X_CUSTOMER_ID')

        if user_id and user_email:
            # Look up or create user based on Kratos identity
            # Attach to request.user or request.kratos_user
            pass

        response = self.get_response(request)
        return response

Add to MIDDLEWARE in settings.py:

MIDDLEWARE = [
    # ... other middleware
    'your_app.middleware.kratos_auth.KratosAuthMiddleware',
]

3. Sync Identity on Registration

When a user registers in Kratos, sync to Django:

Option A: Webhook (recommended)

  • Configure Kratos webhook to call Django API on identity creation
  • Django creates corresponding TeamProfile/CustomerProfile
  • Returns django_profile_id to be stored in Kratos metadata_public

Option B: Poll/Manual Sync

  • Periodic task to sync new Kratos identities to Django
  • Less real-time but simpler to implement

Monitoring & Logging

1. Log Aggregation

Collect logs from all services:

# Kratos logs
docker compose -f nexus-5-auth-kratos/docker-compose.yml logs -f kratos

# Oathkeeper logs
docker compose -f nexus-5-auth-oathkeeper/docker-compose.yml logs -f oathkeeper

# Frontend logs (if using PM2)
pm2 logs nexus-auth-frontend

2. Metrics to Monitor

  • Kratos health endpoint: GET /health/ready
  • Oathkeeper health endpoint: GET /health/ready
  • Database connection pool usage
  • Session count
  • Identity count
  • Failed login attempts
  • Email delivery failures

3. Set Log Levels

Production log levels:

  • Kratos: LOG_LEVEL=info
  • Oathkeeper: log.level=info
  • Frontend: Configure in SvelteKit

Backup & Recovery

1. Database Backups

# Backup Kratos database
docker compose -f nexus-5-auth-kratos/docker-compose.yml exec postgres \
  pg_dump -U kratos kratos > kratos-backup-$(date +%Y%m%d).sql

# Restore
docker compose -f nexus-5-auth-kratos/docker-compose.yml exec -T postgres \
  psql -U kratos kratos < kratos-backup-20251014.sql

2. Configuration Backups

  • Backup nexus-5-auth-kratos/config/
  • Backup nexus-5-auth-oathkeeper/config/
  • Backup .env files (encrypted storage!)
  • Backup JWKS keys: nexus-5-auth-oathkeeper/config/id_token.jwks.json

Rollback Plan

If issues occur in production:

1. Quick Rollback

# Stop services
docker compose down

# Revert to previous .env
git checkout HEAD~1 nexus-5-auth-*/

# Restart with old config
docker compose up -d

2. Database Rollback

# Restore from backup
docker compose exec -T postgres psql -U kratos kratos < kratos-backup-YYYYMMDD.sql

Security Checklist

  • All secrets rotated for production
  • SSL certificates installed and valid
  • HTTPS enforced on all domains
  • Database passwords strong and unique
  • SMTP credentials secured
  • Cookie domain set to .example.com
  • Session cookies marked as Secure
  • CORS properly configured
  • Admin API requires authentication
  • Rate limiting configured (if needed)
  • Firewall rules: Only 443/80 exposed publicly
  • Internal ports (4433, 4434, 4456, 5432) blocked from external access

Support & Troubleshooting

Common Issues

Issue: "Cookie not being set"

  • Check session.cookie.domain in kratos.yml is example.com
  • Verify HTTPS is working
  • Check browser dev tools > Application > Cookies

Issue: "CORS errors"

  • Verify Oathkeeper CORS config includes all domains
  • Check allow_credentials: true
  • Verify Origin header matches allowed_origins

Issue: "Redirect loop"

  • Check preserve_host settings in access rules
  • Verify Kratos allowed_return_urls includes production domains

Issue: "WebAuthn not working"

  • Verify webauthn.config.rp.id is example.com
  • Check webauthn.config.rp.origins includes production URLs
  • Ensure HTTPS is working (WebAuthn requires secure context)

Debug Commands

# Check Oathkeeper rules
curl http://localhost:4456/rules | jq .

# Check Kratos sessions
curl -H "Cookie: ory_kratos_session=..." http://localhost:4433/sessions/whoami

# Test Oathkeeper decision API
curl -H "Cookie: ory_kratos_session=..." http://localhost:4455/decisions/admin/identities

# View Kratos configuration
docker compose exec kratos cat /etc/kratos/kratos.yml

Production Deployment Complete! 🎉

Once all checklist items are complete, your Nexus 5 Auth system is production-ready with:

Ory Kratos for identity management Ory Oathkeeper for authentication & authorization SvelteKit frontend with admin dashboard Full Django integration with custom headers Secure session management across subdomains WebAuthn/TOTP support Email verification & recovery Complete API endpoint coverage

Next Steps:

  1. Monitor logs for the first 24 hours
  2. Test all user flows in production
  3. Set up automated backups
  4. Configure monitoring/alerting
  5. Document any environment-specific configurations