Like AOF? Give us a star!

If you find AOF useful, please star us on GitHub. It helps us reach more developers and grow the community.

AOF Production Deployment Guide

Comprehensive guide for deploying the Agentic Ops Framework (AOF) in production environments

Prerequisites
Deployment Options
Platform Setup
Configuration
Security
Monitoring
Scaling
Troubleshooting

1. Prerequisites

System Requirements

Minimum Requirements:

CPU: 2 cores (4 cores recommended)
RAM: 4 GB (8 GB recommended)
Disk: 10 GB available space
OS: Linux (Ubuntu 20.04+, RHEL 8+), macOS 11+, Windows Server 2019+

Network Requirements:

Outbound HTTPS (443) for LLM provider APIs
Inbound ports for webhook servers (default: 8080)
Low latency connection to LLM providers (<100ms recommended)

Dependencies

Core Dependencies:

# Rust toolchain (1.75+)
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
rustup default stable
rustup update

# Build tools (Linux)
sudo apt-get update
sudo apt-get install -y build-essential pkg-config libssl-dev

# Build tools (macOS)
xcode-select --install

# Optional: Docker & Docker Compose
curl -fsSL https://get.docker.com -o get-docker.sh
sh get-docker.sh

Runtime Dependencies:

OpenSSL 1.1.1+ or 3.0+
glibc 2.31+ (Linux)
libc++ (macOS)

API Keys Required

LLM Provider Keys (at least one):
- Anthropic: ANTHROPIC_API_KEY
- OpenAI: OPENAI_API_KEY
- AWS Bedrock: AWS credentials configured
Platform Integration Keys (optional):
- WhatsApp Business: WHATSAPP_ACCESS_TOKEN, WHATSAPP_VERIFY_TOKEN
- Telegram: TELEGRAM_BOT_TOKEN
- Slack: SLACK_BOT_TOKEN, SLACK_SIGNING_SECRET
- Discord: DISCORD_BOT_TOKEN, DISCORD_PUBLIC_KEY
MCP Server Access (optional):
- Server-specific authentication tokens
- OAuth credentials for cloud-based MCP servers

2. Deployment Options

Option A: Standalone Binary

Best for: Simple deployments, single-server setups, development

# Build release binary
cargo build --release --workspace

# Install aofctl globally
cargo install --path crates/aofctl

# Verify installation
aofctl --version

# Run agent
aofctl run --config /etc/aof/agent.yaml \
  --input "Deploy application to production"

Production Setup:

# Create system user
sudo useradd -r -s /bin/false aof

# Install binary
sudo cp target/release/aofctl /usr/local/bin/
sudo chmod +x /usr/local/bin/aofctl

# Create directories
sudo mkdir -p /etc/aof /var/lib/aof /var/log/aof
sudo chown -R aof:aof /var/lib/aof /var/log/aof

# Create systemd service
sudo tee /etc/systemd/system/aof-agent.service > /dev/null <<'EOF'
[Unit]
Description=AOF Agent Service
After=network-online.target
Wants=network-online.target

[Service]
Type=simple
User=aof
Group=aof
ExecStart=/usr/local/bin/aofctl run --config /etc/aof/agent.yaml
Restart=always
RestartSec=10
StandardOutput=append:/var/log/aof/agent.log
StandardError=append:/var/log/aof/agent.err

# Security settings
NoNewPrivileges=true
PrivateTmp=true
ProtectSystem=strict
ProtectHome=true
ReadWritePaths=/var/lib/aof /var/log/aof

[Install]
WantedBy=multi-user.target
EOF

# Enable and start service
sudo systemctl daemon-reload
sudo systemctl enable aof-agent
sudo systemctl start aof-agent

Option B: Docker Container

Best for: Containerized environments, Kubernetes, cloud platforms

Dockerfile:

# Multi-stage build for minimal image size
FROM rust:1.75-slim-bookworm AS builder

WORKDIR /build

# Install dependencies
RUN apt-get update && apt-get install -y \
    pkg-config \
    libssl-dev \
    && rm -rf /var/lib/apt/lists/*

# Copy workspace
COPY . .

# Build release
RUN cargo build --release --workspace

# Runtime stage
FROM debian:bookworm-slim

# Install runtime dependencies
RUN apt-get update && apt-get install -y \
    ca-certificates \
    libssl3 \
    && rm -rf /var/lib/apt/lists/*

# Create non-root user
RUN useradd -r -s /bin/false aof

# Copy binaries
COPY --from=builder /build/target/release/aofctl /usr/local/bin/
COPY --from=builder /build/target/release/aof-triggers /usr/local/bin/

# Set permissions
RUN chown aof:aof /usr/local/bin/aofctl /usr/local/bin/aof-triggers

USER aof
WORKDIR /app

EXPOSE 8080

CMD ["aofctl", "run", "--config", "/config/agent.yaml"]

Build and Run:

# Build image
docker build -t aof:latest .

# Run container
docker run -d \
  --name aof-agent \
  --restart unless-stopped \
  -v $(pwd)/config:/config:ro \
  -v aof-data:/data \
  -e ANTHROPIC_API_KEY="${ANTHROPIC_API_KEY}" \
  -e RUST_LOG=info \
  -p 8080:8080 \
  aof:latest

# Check logs
docker logs -f aof-agent

Docker Compose:

version: '3.8'

services:
  aof-agent:
    build: .
    container_name: aof-agent
    restart: unless-stopped
    ports:
      - "8080:8080"
    volumes:
      - ./config:/config:ro
      - aof-data:/data
      - aof-logs:/logs
    environment:
      - ANTHROPIC_API_KEY=${ANTHROPIC_API_KEY}
      - OPENAI_API_KEY=${OPENAI_API_KEY}
      - RUST_LOG=info
      - AOF_MEMORY_BACKEND=redis
      - REDIS_URL=redis://redis:6379
    depends_on:
      - redis
    networks:
      - aof-network
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8080/health"]
      interval: 30s
      timeout: 10s
      retries: 3

  redis:
    image: redis:7-alpine
    container_name: aof-redis
    restart: unless-stopped
    volumes:
      - redis-data:/data
    networks:
      - aof-network
    command: redis-server --appendonly yes

  aof-triggers:
    build: .
    container_name: aof-triggers
    restart: unless-stopped
    ports:
      - "8081:8080"
    volumes:
      - ./config:/config:ro
    environment:
      - WHATSAPP_ACCESS_TOKEN=${WHATSAPP_ACCESS_TOKEN}
      - TELEGRAM_BOT_TOKEN=${TELEGRAM_BOT_TOKEN}
      - SLACK_BOT_TOKEN=${SLACK_BOT_TOKEN}
      - DISCORD_BOT_TOKEN=${DISCORD_BOT_TOKEN}
    networks:
      - aof-network
    command: ["/usr/local/bin/aof-triggers"]

volumes:
  aof-data:
  aof-logs:
  redis-data:

networks:
  aof-network:
    driver: bridge

Option C: Kubernetes Deployment

Best for: Large-scale deployments, high availability, auto-scaling

Namespace:

apiVersion: v1
kind: Namespace
metadata:
  name: aof-system
  labels:
    app.kubernetes.io/name: aof

ConfigMap:

apiVersion: v1
kind: ConfigMap
metadata:
  name: aof-config
  namespace: aof-system
data:
  agent.yaml: |
    apiVersion: aof.dev/v1
    kind: Agent
    metadata:
      name: production-agent
    spec:
      model:
        provider: anthropic
        model: claude-3-5-sonnet-20241022
      tools:
        - name: kubectl
          type: mcp
          config:
            command: npx
            args: ["-y", "kubectl-mcp"]
      memory:
        backend: redis
        config:
          url: redis://aof-redis:6379

Secret:

apiVersion: v1
kind: Secret
metadata:
  name: aof-secrets
  namespace: aof-system
type: Opaque
stringData:
  anthropic-api-key: "sk-ant-..."
  openai-api-key: "sk-..."
  whatsapp-token: "..."
  telegram-token: "..."

Deployment:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: aof-agent
  namespace: aof-system
  labels:
    app: aof-agent
spec:
  replicas: 3
  selector:
    matchLabels:
      app: aof-agent
  template:
    metadata:
      labels:
        app: aof-agent
    spec:
      serviceAccountName: aof-agent
      securityContext:
        runAsNonRoot: true
        runAsUser: 1000
        fsGroup: 1000
      containers:
      - name: agent
        image: aof:latest
        imagePullPolicy: Always
        ports:
        - containerPort: 8080
          name: http
          protocol: TCP
        env:
        - name: ANTHROPIC_API_KEY
          valueFrom:
            secretKeyRef:
              name: aof-secrets
              key: anthropic-api-key
        - name: RUST_LOG
          value: "info"
        - name: AOF_MEMORY_BACKEND
          value: "redis"
        - name: REDIS_URL
          value: "redis://aof-redis:6379"
        volumeMounts:
        - name: config
          mountPath: /config
          readOnly: true
        - name: data
          mountPath: /data
        resources:
          requests:
            memory: "512Mi"
            cpu: "500m"
          limits:
            memory: "2Gi"
            cpu: "2000m"
        livenessProbe:
          httpGet:
            path: /health
            port: 8080
          initialDelaySeconds: 30
          periodSeconds: 10
        readinessProbe:
          httpGet:
            path: /ready
            port: 8080
          initialDelaySeconds: 5
          periodSeconds: 5
      volumes:
      - name: config
        configMap:
          name: aof-config
      - name: data
        persistentVolumeClaim:
          claimName: aof-data

Service:

apiVersion: v1
kind: Service
metadata:
  name: aof-agent
  namespace: aof-system
spec:
  type: ClusterIP
  selector:
    app: aof-agent
  ports:
  - port: 80
    targetPort: 8080
    protocol: TCP
    name: http

HorizontalPodAutoscaler:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: aof-agent
  namespace: aof-system
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: aof-agent
  minReplicas: 3
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 80

Deploy to Kubernetes:

# Apply manifests
kubectl apply -f k8s/namespace.yaml
kubectl apply -f k8s/configmap.yaml
kubectl apply -f k8s/secret.yaml
kubectl apply -f k8s/deployment.yaml
kubectl apply -f k8s/service.yaml
kubectl apply -f k8s/hpa.yaml

# Verify deployment
kubectl get pods -n aof-system
kubectl logs -f deployment/aof-agent -n aof-system

# Check scaling
kubectl get hpa -n aof-system -w

Option D: Desktop App Distribution

Best for: End-user deployments, local development, GUI access

Build Desktop App:

# Build Tauri application
cd crates/aof-gui
npm install
npm run tauri build

# Output locations:
# - macOS: target/release/bundle/dmg/AOF.dmg
# - Windows: target/release/bundle/msi/AOF.msi
# - Linux: target/release/bundle/deb/aof_*.deb

macOS Distribution:

# Sign application
codesign --deep --force --verify --verbose \
  --sign "Developer ID Application: Your Name" \
  --options runtime \
  target/release/bundle/macos/AOF.app

# Notarize
xcrun notarytool submit target/release/bundle/dmg/AOF.dmg \
  --apple-id "your@email.com" \
  --password "app-specific-password" \
  --team-id "TEAM_ID"

# Staple ticket
xcrun stapler staple target/release/bundle/dmg/AOF.dmg

Windows Distribution:

# Sign MSI
signtool sign /f cert.pfx /p password /tr http://timestamp.digicert.com /td sha256 /fd sha256 target/release/bundle/msi/AOF.msi

Linux Distribution:

# Build AppImage
cd crates/aof-gui
cargo install cargo-appimage
cargo appimage

# Build Flatpak
flatpak-builder --repo=repo build-dir io.aof.App.yaml

3. Platform Setup

WhatsApp Business API Setup

Prerequisites:

Meta Business Account
WhatsApp Business Account
Phone number (not used on WhatsApp)

Step 1: Create WhatsApp Business App

# Via Meta Developer Console
# 1. Go to https://developers.facebook.com/apps
# 2. Create App > Business > WhatsApp
# 3. Add WhatsApp product
# 4. Get test number or add your own

Step 2: Configure Webhook

# config/whatsapp.yaml
webhook_url: "https://your-domain.com/webhooks/whatsapp"
verify_token: "your-random-verify-token-here"
access_token: "EAAxxxxxxxxxxxxxxxx"
phone_number_id: "123456789"

Step 3: Set Environment Variables

export WHATSAPP_ACCESS_TOKEN="EAAxxxxxxxxxxxxxxxx"
export WHATSAPP_VERIFY_TOKEN="your-random-verify-token-here"
export WHATSAPP_PHONE_NUMBER_ID="123456789"

Step 4: Verify Webhook

# Meta will send GET request to verify
# Your server must return challenge parameter
curl "https://your-domain.com/webhooks/whatsapp?hub.mode=subscribe&hub.challenge=CHALLENGE&hub.verify_token=your-random-verify-token-here"

Step 5: Test Integration

# Start webhook server
cargo run -p aof-triggers

# Send test message via WhatsApp
# Server logs should show incoming webhook

Telegram Bot Creation

Step 1: Create Bot via BotFather

# 1. Open Telegram and search for @BotFather
# 2. Send /newbot
# 3. Follow prompts to set name and username
# 4. Save the bot token

Step 2: Configure Bot

# Set webhook
curl -X POST "https://api.telegram.org/bot<TOKEN>/setWebhook" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://your-domain.com/webhooks/telegram",
    "secret_token": "your-secret-token"
  }'

# Verify webhook
curl "https://api.telegram.org/bot<TOKEN>/getWebhookInfo"

Step 3: Environment Variables

export TELEGRAM_BOT_TOKEN="1234567890:ABCdefGHIjklMNOpqrsTUVwxyz"
export TELEGRAM_WEBHOOK_SECRET="your-secret-token"

Step 4: Bot Commands

# Set bot commands
curl -X POST "https://api.telegram.org/bot<TOKEN>/setMyCommands" \
  -H "Content-Type: application/json" \
  -d '{
    "commands": [
      {"command": "start", "description": "Start the agent"},
      {"command": "help", "description": "Show help"},
      {"command": "status", "description": "Check agent status"}
    ]
  }'

Slack App Configuration

Step 1: Create Slack App

# 1. Go to https://api.slack.com/apps
# 2. Click "Create New App"
# 3. Choose "From scratch"
# 4. Enter app name and workspace

Step 2: Configure OAuth & Permissions

# Required Bot Token Scopes:
- chat:write
- chat:write.public
- commands
- im:history
- im:read
- im:write
- users:read

Step 3: Enable Event Subscriptions

# Request URL: https://your-domain.com/webhooks/slack
# Subscribe to bot events:
- message.im
- app_mention

Step 4: Install App to Workspace

# Get tokens from OAuth & Permissions page
export SLACK_BOT_TOKEN="xoxb-..."
export SLACK_SIGNING_SECRET="..."
export SLACK_APP_TOKEN="xapp-..."  # For Socket Mode

Step 5: Create Slash Commands (Optional)

# Command: /aof
# Request URL: https://your-domain.com/webhooks/slack/commands
# Short Description: "Run AOF agent"

Discord Bot Setup

Step 1: Create Discord Application

# 1. Go to https://discord.com/developers/applications
# 2. Click "New Application"
# 3. Enter application name
# 4. Go to "Bot" section
# 5. Click "Add Bot"

Step 2: Configure Bot Permissions

# Required Permissions:
- Send Messages
- Read Message History
- Use Slash Commands
- Embed Links
- Attach Files

Step 3: Get Bot Token

export DISCORD_BOT_TOKEN="MTxxxxxxxxxxxxxxxxxx.xxxxxx.xxxxxxxxxxxxxxxxxxxxxxxxxxx"
export DISCORD_PUBLIC_KEY="xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
export DISCORD_APPLICATION_ID="123456789012345678"

Step 4: Register Slash Commands

curl -X POST \
  "https://discord.com/api/v10/applications/${DISCORD_APPLICATION_ID}/commands" \
  -H "Authorization: Bot ${DISCORD_BOT_TOKEN}" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "aof",
    "description": "Execute AOF agent task",
    "options": [
      {
        "name": "task",
        "description": "Task description",
        "type": 3,
        "required": true
      }
    ]
  }'

Step 5: Invite Bot to Server

# Generate OAuth2 URL with bot scope and permissions
# https://discord.com/oauth2/authorize?client_id=CLIENT_ID&scope=bot+applications.commands&permissions=PERMISSIONS

4. Configuration

Environment Variables

Core Variables:

# LLM Provider Configuration
export ANTHROPIC_API_KEY="sk-ant-..."
export OPENAI_API_KEY="sk-..."

# AWS Bedrock (alternative to env vars)
export AWS_REGION="us-east-1"
export AWS_ACCESS_KEY_ID="..."
export AWS_SECRET_ACCESS_KEY="..."

# Application Settings
export RUST_LOG="info,aof_core=debug"
export AOF_CONFIG_PATH="/etc/aof/agent.yaml"
export AOF_DATA_DIR="/var/lib/aof"

# Memory Backend
export AOF_MEMORY_BACKEND="redis"  # Options: memory, redis, sled, file
export REDIS_URL="redis://localhost:6379"

# Webhook Server
export AOF_WEBHOOK_HOST="0.0.0.0"
export AOF_WEBHOOK_PORT="8080"

# Platform Tokens
export WHATSAPP_ACCESS_TOKEN="..."
export TELEGRAM_BOT_TOKEN="..."
export SLACK_BOT_TOKEN="..."
export DISCORD_BOT_TOKEN="..."

Production .env File:

# /etc/aof/.env
ANTHROPIC_API_KEY=sk-ant-api03-xxxxx
RUST_LOG=info,aof_core=debug,aof_runtime=info
AOF_MEMORY_BACKEND=redis
REDIS_URL=redis://:password@localhost:6379/0
AOF_WEBHOOK_HOST=0.0.0.0
AOF_WEBHOOK_PORT=8080
WHATSAPP_ACCESS_TOKEN=EAAxxxxx
TELEGRAM_BOT_TOKEN=123456:ABCxxx

YAML Configuration

Agent Configuration:

# /etc/aof/agent.yaml
apiVersion: aof.dev/v1
kind: Agent
metadata:
  name: production-devops-agent
  labels:
    environment: production
    team: platform
spec:
  # Model configuration
  model:
    provider: anthropic  # Options: anthropic, openai, bedrock
    model: claude-3-5-sonnet-20241022
    temperature: 0.7
    max_tokens: 4096
    timeout_seconds: 300

  # Tool configuration
  tools:
    - name: kubectl
      type: mcp
      config:
        command: npx
        args: ["-y", "kubectl-mcp"]
        transport: stdio

    - name: aws-cli
      type: mcp
      config:
        command: npx
        args: ["-y", "aws-mcp"]
        transport: stdio

    - name: terraform
      type: custom
      config:
        command: terraform
        allowed_commands: ["plan", "apply", "destroy"]

  # Memory configuration
  memory:
    backend: redis
    config:
      url: redis://localhost:6379
      db: 0
      pool_size: 10
      timeout_seconds: 5
      ttl_seconds: 3600

  # Execution limits
  limits:
    max_iterations: 50
    max_execution_time_seconds: 1800
    max_memory_mb: 2048

  # Retry configuration
  retry:
    max_attempts: 3
    backoff_seconds: 5
    exponential: true

Multi-Agent Configuration:

# /etc/aof/agents.yaml
apiVersion: aof.dev/v1
kind: AgentList
agents:
  - metadata:
      name: k8s-operator
    spec:
      model:
        provider: anthropic
        model: claude-3-5-sonnet-20241022
      tools: [kubectl, helm]

  - metadata:
      name: aws-architect
    spec:
      model:
        provider: bedrock
        model: anthropic.claude-3-5-sonnet-20241022-v2:0
      tools: [aws-cli, terraform]

  - metadata:
      name: security-auditor
    spec:
      model:
        provider: openai
        model: gpt-4
      tools: [trivy, kubesec]

Secrets Management

Option 1: Environment Variables

# Load from secure vault
export ANTHROPIC_API_KEY=$(vault kv get -field=key secret/aof/anthropic)
export OPENAI_API_KEY=$(vault kv get -field=key secret/aof/openai)

Option 2: Kubernetes Secrets

# Create secret from file
kubectl create secret generic aof-secrets \
  --from-file=anthropic-key=/path/to/key \
  --from-file=openai-key=/path/to/key \
  -n aof-system

# Use in deployment
env:
- name: ANTHROPIC_API_KEY
  valueFrom:
    secretKeyRef:
      name: aof-secrets
      key: anthropic-key

Option 3: AWS Secrets Manager

// Load at runtime
use aws_config::load_from_env;
use aws_sdk_secretsmanager::Client;

async fn load_secret(name: &str) -> String {
    let config = load_from_env().await;
    let client = Client::new(&config);
    let resp = client
        .get_secret_value()
        .secret_id(name)
        .send()
        .await
        .unwrap();
    resp.secret_string().unwrap().to_string()
}

Option 4: HashiCorp Vault

# Initialize Vault
vault kv put secret/aof/anthropic key="sk-ant-..."
vault kv put secret/aof/openai key="sk-..."

# Create policy
vault policy write aof-policy - <<EOF
path "secret/data/aof/*" {
  capabilities = ["read"]
}
EOF

# Generate token
vault token create -policy=aof-policy

MCP Server Setup

Stdio Transport:

tools:
  - name: filesystem
    type: mcp
    config:
      command: npx
      args: ["-y", "@modelcontextprotocol/server-filesystem", "/allowed/path"]
      transport: stdio

HTTP Transport:

tools:
  - name: remote-api
    type: mcp
    config:
      url: "https://mcp.example.com"
      transport: http
      headers:
        Authorization: "Bearer ${MCP_API_KEY}"

SSE Transport:

tools:
  - name: streaming-api
    type: mcp
    config:
      url: "https://mcp.example.com/sse"
      transport: sse
      reconnect: true
      reconnect_delay_ms: 1000

5. Security

Webhook Signature Verification

WhatsApp Signature Verification:

use hmac::{Hmac, Mac};
use sha2::Sha256;

fn verify_whatsapp_signature(
    payload: &[u8],
    signature: &str,
    app_secret: &str,
) -> bool {
    type HmacSha256 = Hmac<Sha256>;

    let expected = signature.strip_prefix("sha256=").unwrap_or(signature);

    let mut mac = HmacSha256::new_from_slice(app_secret.as_bytes())
        .expect("HMAC can take key of any size");
    mac.update(payload);

    let result = mac.finalize();
    let code_bytes = result.into_bytes();

    hex::encode(code_bytes) == expected
}

Telegram Signature Verification:

use hmac::{Hmac, Mac};
use sha2::Sha256;

fn verify_telegram_signature(
    secret_token: &str,
    header_token: &str,
) -> bool {
    secret_token == header_token
}

Slack Signature Verification:

use hmac::{Hmac, Mac};
use sha2::Sha256;

fn verify_slack_signature(
    signing_secret: &str,
    timestamp: &str,
    body: &str,
    signature: &str,
) -> bool {
    type HmacSha256 = Hmac<Sha256>;

    let sig_basestring = format!("v0:{}:{}", timestamp, body);

    let mut mac = HmacSha256::new_from_slice(signing_secret.as_bytes())
        .expect("HMAC can take key of any size");
    mac.update(sig_basestring.as_bytes());

    let result = mac.finalize();
    let code_bytes = result.into_bytes();
    let computed = format!("v0={}", hex::encode(code_bytes));

    computed == signature
}

Discord Signature Verification:

use ed25519_dalek::{PublicKey, Signature, Verifier};

fn verify_discord_signature(
    public_key: &str,
    signature: &str,
    timestamp: &str,
    body: &str,
) -> bool {
    let message = format!("{}{}", timestamp, body);

    let public_key_bytes = hex::decode(public_key).unwrap();
    let signature_bytes = hex::decode(signature).unwrap();

    let public_key = PublicKey::from_bytes(&public_key_bytes).unwrap();
    let signature = Signature::from_bytes(&signature_bytes).unwrap();

    public_key.verify(message.as_bytes(), &signature).is_ok()
}

Rate Limiting

Application-Level Rate Limiting:

use governor::{Quota, RateLimiter};
use std::num::NonZeroU32;

// Create rate limiter: 100 requests per minute
let quota = Quota::per_minute(NonZeroU32::new(100).unwrap());
let limiter = RateLimiter::direct(quota);

// Check rate limit
if limiter.check().is_err() {
    return Err("Rate limit exceeded");
}

Nginx Rate Limiting:

http {
    limit_req_zone $binary_remote_addr zone=webhook_limit:10m rate=10r/s;

    server {
        location /webhooks/ {
            limit_req zone=webhook_limit burst=20 nodelay;
            proxy_pass http://aof-backend;
        }
    }
}

Kubernetes Rate Limiting (Ingress):

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: aof-ingress
  annotations:
    nginx.ingress.kubernetes.io/limit-rps: "10"
    nginx.ingress.kubernetes.io/limit-connections: "100"
spec:
  rules:
  - host: aof.example.com
    http:
      paths:
      - path: /webhooks
        pathType: Prefix
        backend:
          service:
            name: aof-agent
            port:
              number: 80

API Key Rotation

Automated Rotation Script:

#!/bin/bash
# rotate-keys.sh

# Rotate Anthropic API key
NEW_KEY=$(vault kv get -field=key secret/aof/anthropic-new)
kubectl set env deployment/aof-agent \
  ANTHROPIC_API_KEY="${NEW_KEY}" \
  -n aof-system

# Wait for rollout
kubectl rollout status deployment/aof-agent -n aof-system

# Archive old key
vault kv put secret/aof/anthropic-archived \
  key="$(vault kv get -field=key secret/aof/anthropic)" \
  rotated_at="$(date -u +%Y-%m-%dT%H:%M:%SZ)"

# Update current key
vault kv put secret/aof/anthropic key="${NEW_KEY}"

echo "API key rotation complete"

Rotation Schedule (Cron):

# /etc/cron.d/aof-key-rotation
0 2 1 * * /usr/local/bin/rotate-keys.sh >> /var/log/aof/key-rotation.log 2>&1

Network Policies

Kubernetes NetworkPolicy:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: aof-agent-network-policy
  namespace: aof-system
spec:
  podSelector:
    matchLabels:
      app: aof-agent
  policyTypes:
  - Ingress
  - Egress
  ingress:
  - from:
    - namespaceSelector:
        matchLabels:
          name: ingress-nginx
    ports:
    - protocol: TCP
      port: 8080
  egress:
  # Allow DNS
  - to:
    - namespaceSelector:
        matchLabels:
          name: kube-system
    ports:
    - protocol: UDP
      port: 53
  # Allow Redis
  - to:
    - podSelector:
        matchLabels:
          app: aof-redis
    ports:
    - protocol: TCP
      port: 6379
  # Allow HTTPS to LLM providers
  - to:
    - podSelector: {}
    ports:
    - protocol: TCP
      port: 443

Firewall Rules (iptables):

# Allow incoming webhook traffic
iptables -A INPUT -p tcp --dport 8080 -j ACCEPT

# Allow outgoing HTTPS to LLM providers
iptables -A OUTPUT -p tcp --dport 443 -j ACCEPT

# Block all other incoming traffic
iptables -A INPUT -j DROP

6. Monitoring

Logging Configuration

Structured Logging:

# Environment variable configuration
export RUST_LOG="info,aof_core=debug,aof_runtime=info,aof_llm=debug"

# JSON logging for production
export RUST_LOG_FORMAT="json"

Log Rotation:

# /etc/logrotate.d/aof
/var/log/aof/*.log {
    daily
    rotate 30
    compress
    delaycompress
    notifempty
    create 0644 aof aof
    sharedscripts
    postrotate
        systemctl reload aof-agent > /dev/null 2>&1 || true
    endscript
}

Centralized Logging (Fluent Bit):

# fluent-bit.conf
[SERVICE]
    Flush        5
    Daemon       Off
    Log_Level    info

[INPUT]
    Name              tail
    Path              /var/log/aof/*.log
    Parser            json
    Tag               aof.*

[OUTPUT]
    Name              es
    Match             aof.*
    Host              elasticsearch
    Port              9200
    Index             aof-logs
    Type              _doc

Kubernetes Logging (Fluentd):

apiVersion: v1
kind: ConfigMap
metadata:
  name: fluentd-config
  namespace: kube-system
data:
  fluent.conf: |
    <source>
      @type tail
      path /var/log/containers/aof-agent*.log
      pos_file /var/log/fluentd-aof.log.pos
      tag kubernetes.aof
      <parse>
        @type json
        time_key time
        time_format %Y-%m-%dT%H:%M:%S.%NZ
      </parse>
    </source>

    <match kubernetes.aof>
      @type elasticsearch
      host elasticsearch.logging.svc.cluster.local
      port 9200
      index_name aof-logs
      type_name _doc
    </match>

Metrics Collection

Prometheus Metrics:

use prometheus::{Counter, Histogram, Registry};

lazy_static! {
    static ref REGISTRY: Registry = Registry::new();

    static ref AGENT_EXECUTIONS: Counter = Counter::new(
        "aof_agent_executions_total",
        "Total number of agent executions"
    ).unwrap();

    static ref EXECUTION_DURATION: Histogram = Histogram::with_opts(
        HistogramOpts::new(
            "aof_execution_duration_seconds",
            "Agent execution duration in seconds"
        )
    ).unwrap();

    static ref LLM_REQUESTS: Counter = Counter::new(
        "aof_llm_requests_total",
        "Total number of LLM API requests"
    ).unwrap();

    static ref LLM_TOKENS: Counter = Counter::new(
        "aof_llm_tokens_total",
        "Total number of tokens consumed"
    ).unwrap();
}

// Export metrics endpoint
async fn metrics_handler() -> impl Responder {
    let encoder = prometheus::TextEncoder::new();
    let metric_families = REGISTRY.gather();
    encoder.encode_to_string(&metric_families).unwrap()
}

Prometheus Scrape Config:

# prometheus.yml
scrape_configs:
  - job_name: 'aof-agents'
    kubernetes_sd_configs:
      - role: pod
        namespaces:
          names:
            - aof-system
    relabel_configs:
      - source_labels: [__meta_kubernetes_pod_label_app]
        regex: aof-agent
        action: keep
      - source_labels: [__meta_kubernetes_pod_ip]
        target_label: __address__
        replacement: ${1}:8080

ServiceMonitor (Prometheus Operator):

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: aof-agent
  namespace: aof-system
spec:
  selector:
    matchLabels:
      app: aof-agent
  endpoints:
  - port: http
    path: /metrics
    interval: 30s

Alerting Setup

Prometheus Alerts:

# alerts.yml
groups:
- name: aof-alerts
  interval: 30s
  rules:
  - alert: AOFAgentDown
    expr: up{job="aof-agents"} == 0
    for: 5m
    labels:
      severity: critical
    annotations:
      summary: "AOF agent is down"
      description: "AOF agent \{\{ $labels.instance \}\} has been down for 5 minutes"

  - alert: AOFHighErrorRate
    expr: rate(aof_agent_errors_total[5m]) > 0.1
    for: 10m
    labels:
      severity: warning
    annotations:
      summary: "High error rate in AOF agent"
      description: "Error rate is \{\{ $value \}\} errors/sec on \{\{ $labels.instance \}\}"

  - alert: AOFHighLatency
    expr: histogram_quantile(0.95, aof_execution_duration_seconds_bucket) > 30
    for: 15m
    labels:
      severity: warning
    annotations:
      summary: "High execution latency"
      description: "P95 latency is \{\{ $value \}\}s on \{\{ $labels.instance \}\}"

  - alert: AOFHighTokenUsage
    expr: rate(aof_llm_tokens_total[1h]) > 100000
    for: 1h
    labels:
      severity: info
    annotations:
      summary: "High LLM token usage"
      description: "Token usage is \{\{ $value \}\} tokens/sec"

AlertManager Config:

# alertmanager.yml
global:
  resolve_timeout: 5m

route:
  group_by: ['alertname', 'cluster']
  group_wait: 10s
  group_interval: 10s
  repeat_interval: 12h
  receiver: 'default'
  routes:
  - match:
      severity: critical
    receiver: pagerduty
  - match:
      severity: warning
    receiver: slack

receivers:
- name: 'default'
  email_configs:
  - to: 'team@example.com'
    from: 'alerts@example.com'
    smarthost: 'smtp.gmail.com:587'
    auth_username: 'alerts@example.com'
    auth_password: 'password'

- name: 'slack'
  slack_configs:
  - api_url: 'https://hooks.slack.com/services/xxx'
    channel: '#alerts'
    title: 'AOF Alert: \{\{ .GroupLabels.alertname \}\}'
    text: '\{\{ range .Alerts \}\}\{\{ .Annotations.description \}\}\{\{ end \}\}'

- name: 'pagerduty'
  pagerduty_configs:
  - service_key: 'xxx'
    description: '\{\{ .GroupLabels.alertname \}\}'

Health Checks

Application Health Check:

use axum::{Router, routing::get};

async fn health_check() -> &'static str {
    "OK"
}

async fn readiness_check() -> Result<&'static str, StatusCode> {
    // Check dependencies
    if redis_available().await && llm_provider_available().await {
        Ok("READY")
    } else {
        Err(StatusCode::SERVICE_UNAVAILABLE)
    }
}

let app = Router::new()
    .route("/health", get(health_check))
    .route("/ready", get(readiness_check));

Kubernetes Probes:

livenessProbe:
  httpGet:
    path: /health
    port: 8080
  initialDelaySeconds: 30
  periodSeconds: 10
  timeoutSeconds: 5
  failureThreshold: 3

readinessProbe:
  httpGet:
    path: /ready
    port: 8080
  initialDelaySeconds: 5
  periodSeconds: 5
  timeoutSeconds: 3
  successThreshold: 1
  failureThreshold: 3

External Health Monitoring:

# UptimeRobot-style check
*/5 * * * * curl -f https://aof.example.com/health || echo "Health check failed" | mail -s "AOF Down" alerts@example.com

7. Scaling

Horizontal Scaling

Docker Compose Scale:

# Scale to 5 replicas
docker-compose up -d --scale aof-agent=5

# With load balancer
docker-compose -f docker-compose.yml -f docker-compose.scale.yml up -d

Kubernetes HPA:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: aof-agent
  namespace: aof-system
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: aof-agent
  minReplicas: 3
  maxReplicas: 20
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 80
  - type: Pods
    pods:
      metric:
        name: aof_execution_duration_seconds
      target:
        type: AverageValue
        averageValue: "10"
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 300
      policies:
      - type: Percent
        value: 50
        periodSeconds: 60
    scaleUp:
      stabilizationWindowSeconds: 0
      policies:
      - type: Percent
        value: 100
        periodSeconds: 30
      - type: Pods
        value: 2
        periodSeconds: 30
      selectPolicy: Max

KEDA (Event-Driven Autoscaling):

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: aof-agent-scaler
  namespace: aof-system
spec:
  scaleTargetRef:
    name: aof-agent
  minReplicaCount: 3
  maxReplicaCount: 20
  triggers:
  - type: prometheus
    metadata:
      serverAddress: http://prometheus:9090
      metricName: aof_queue_depth
      threshold: '10'
      query: sum(aof_pending_tasks)

Load Balancing

Nginx Load Balancer:

upstream aof_backend {
    least_conn;
    server aof-1:8080 max_fails=3 fail_timeout=30s;
    server aof-2:8080 max_fails=3 fail_timeout=30s;
    server aof-3:8080 max_fails=3 fail_timeout=30s;
    keepalive 32;
}

server {
    listen 80;
    server_name aof.example.com;

    location / {
        proxy_pass http://aof_backend;
        proxy_http_version 1.1;
        proxy_set_header Connection "";
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_connect_timeout 60s;
        proxy_send_timeout 300s;
        proxy_read_timeout 300s;
    }
}

HAProxy Configuration:

global
    maxconn 4096
    daemon

defaults
    mode http
    timeout connect 10s
    timeout client 300s
    timeout server 300s

frontend aof_frontend
    bind *:80
    default_backend aof_backend

backend aof_backend
    balance leastconn
    option httpchk GET /health
    http-check expect status 200
    server aof-1 10.0.1.10:8080 check inter 5s rise 2 fall 3
    server aof-2 10.0.1.11:8080 check inter 5s rise 2 fall 3
    server aof-3 10.0.1.12:8080 check inter 5s rise 2 fall 3

Database Configuration

Redis Cluster:

# docker-compose.redis-cluster.yml
version: '3.8'

services:
  redis-node-1:
    image: redis:7-alpine
    command: redis-server --cluster-enabled yes --cluster-config-file nodes.conf --cluster-node-timeout 5000 --appendonly yes
    volumes:
      - redis-1:/data

  redis-node-2:
    image: redis:7-alpine
    command: redis-server --cluster-enabled yes --cluster-config-file nodes.conf --cluster-node-timeout 5000 --appendonly yes
    volumes:
      - redis-2:/data

  redis-node-3:
    image: redis:7-alpine
    command: redis-server --cluster-enabled yes --cluster-config-file nodes.conf --cluster-node-timeout 5000 --appendonly yes
    volumes:
      - redis-3:/data

  redis-cluster-init:
    image: redis:7-alpine
    command: redis-cli --cluster create redis-node-1:6379 redis-node-2:6379 redis-node-3:6379 --cluster-replicas 0 --cluster-yes
    depends_on:
      - redis-node-1
      - redis-node-2
      - redis-node-3

volumes:
  redis-1:
  redis-2:
  redis-3:

Redis Sentinel (High Availability):

# sentinel.conf
sentinel monitor aof-master 10.0.1.10 6379 2
sentinel down-after-milliseconds aof-master 5000
sentinel parallel-syncs aof-master 1
sentinel failover-timeout aof-master 10000

Application Redis Config:

memory:
  backend: redis
  config:
    # Sentinel configuration
    sentinels:
      - host: sentinel-1
        port: 26379
      - host: sentinel-2
        port: 26379
      - host: sentinel-3
        port: 26379
    master_name: aof-master
    db: 0
    pool_size: 20
    timeout_seconds: 5

Caching Strategies

Redis Caching Layer:

use redis::{Client, Commands};

struct CachedLLMProvider {
    provider: Box<dyn Model>,
    cache: Client,
}

impl CachedLLMProvider {
    async fn generate(&self, request: &ModelRequest) -> AofResult<ModelResponse> {
        let cache_key = format!("llm:{}:{}", request.model, hash(request.messages));

        // Check cache
        if let Ok(cached) = self.cache.get::<_, String>(&cache_key) {
            return Ok(serde_json::from_str(&cached)?);
        }

        // Generate response
        let response = self.provider.generate(request).await?;

        // Cache response (TTL: 1 hour)
        let _: () = self.cache.set_ex(
            &cache_key,
            serde_json::to_string(&response)?,
            3600
        )?;

        Ok(response)
    }
}

CDN Caching (CloudFlare):

# Add cache headers for static content
location /static/ {
    expires 1y;
    add_header Cache-Control "public, immutable";
}

# Cache API responses
location /api/public/ {
    proxy_cache api_cache;
    proxy_cache_valid 200 10m;
    proxy_cache_key "$request_uri";
    add_header X-Cache-Status $upstream_cache_status;
}

8. Troubleshooting

Common Issues

Issue 1: Agent Execution Timeout

Error: Agent execution timed out after 300 seconds

Solution:

# Increase timeout in agent.yaml
spec:
  limits:
    max_execution_time_seconds: 1800  # 30 minutes

Issue 2: Memory Exhausted

Error: Cannot allocate memory

Solution:

# Check memory usage
docker stats

# Increase container memory
docker run -m 4g aof:latest

# Kubernetes
resources:
  limits:
    memory: "4Gi"

Issue 3: Redis Connection Failed

Error: Connection refused (redis://localhost:6379)

Solution:

# Check Redis status
redis-cli ping

# Verify connection
telnet localhost 6379

# Check network
docker network inspect aof-network

# Update connection string
export REDIS_URL="redis://redis-host:6379"

Issue 4: LLM API Rate Limit

Error: Rate limit exceeded (429)

Solution:

// Implement exponential backoff
use tokio::time::{sleep, Duration};

for attempt in 0..5 {
    match provider.generate(request).await {
        Ok(response) => return Ok(response),
        Err(e) if e.is_rate_limit() => {
            let delay = 2_u64.pow(attempt) * 1000;
            sleep(Duration::from_millis(delay)).await;
        }
        Err(e) => return Err(e),
    }
}

Issue 5: Webhook Signature Verification Failed

Error: Invalid signature

Solution:

# Verify secret is correct
echo $WHATSAPP_VERIFY_TOKEN

# Check webhook payload
tail -f /var/log/aof/webhooks.log

# Test signature locally
curl -X POST http://localhost:8080/webhooks/whatsapp \
  -H "X-Hub-Signature-256: sha256=..." \
  -d @payload.json

Debug Mode

Enable Debug Logging:

# Full debug
export RUST_LOG="debug"

# Selective debug
export RUST_LOG="info,aof_core=debug,aof_llm=trace"

# With timestamps
export RUST_LOG="info,aof_core=debug"
export RUST_LOG_STYLE="always"

Interactive Debugging:

# Run with debugger
rust-lldb target/debug/aofctl

# Set breakpoints
b aof_core::agent::execute
run --config agent.yaml

# Inspect variables
p request
p response

Trace Network Calls:

# Enable HTTP tracing
export RUST_LOG="reqwest=trace"

# Capture with tcpdump
sudo tcpdump -i any -w aof-traffic.pcap port 443

# Analyze with wireshark
wireshark aof-traffic.pcap

Log Analysis

Parse JSON Logs:

# Extract errors
jq 'select(.level == "ERROR")' /var/log/aof/agent.log

# Count by error type
jq -r '.error_type' /var/log/aof/agent.log | sort | uniq -c

# Filter by time range
jq 'select(.timestamp > "2024-01-01T00:00:00Z")' /var/log/aof/agent.log

Elasticsearch Queries:

{
  "query": {
    "bool": {
      "must": [
        { "match": { "app": "aof-agent" } },
        { "range": { "@timestamp": { "gte": "now-1h" } } }
      ],
      "should": [
        { "match": { "level": "ERROR" } },
        { "match": { "level": "WARN" } }
      ]
    }
  },
  "aggs": {
    "errors_by_type": {
      "terms": { "field": "error_type.keyword" }
    }
  }
}

Loki LogQL:

{app="aof-agent"}
  |= "ERROR"
  | json
  | level="ERROR"
  | line_format "\{\{.timestamp\}\} \{\{.error_type\}\}: \{\{.message\}\}"

Performance Tuning

Profile CPU Usage:

# Install flamegraph
cargo install flamegraph

# Profile application
sudo flamegraph target/release/aofctl run --config agent.yaml

# View flamegraph.svg in browser

Memory Profiling:

# Use valgrind
valgrind --tool=massif --massif-out-file=massif.out \
  target/release/aofctl run --config agent.yaml

# Analyze
ms_print massif.out

Benchmark Performance:

# Load testing with k6
k6 run - <<EOF
import http from 'k6/http';
import { check } from 'k6';

export let options = {
  stages: [
    { duration: '2m', target: 100 },
    { duration: '5m', target: 100 },
    { duration: '2m', target: 0 },
  ],
};

export default function() {
  let res = http.post('http://localhost:8080/webhooks/test',
    JSON.stringify({ message: 'test' }),
    { headers: { 'Content-Type': 'application/json' } }
  );
  check(res, { 'status is 200': (r) => r.status === 200 });
}
EOF

Database Performance:

# Redis slowlog
redis-cli slowlog get 10

# Monitor commands
redis-cli monitor

# Benchmark
redis-benchmark -h localhost -p 6379 -c 50 -n 10000

LLM Provider Latency:

use std::time::Instant;

let start = Instant::now();
let response = provider.generate(request).await?;
let duration = start.elapsed();

metrics::histogram!("aof_llm_latency_seconds", duration.as_secs_f64());

Appendix

Quick Reference

Essential Commands:

# Build
cargo build --release --workspace

# Run agent
aofctl run --config agent.yaml --input "task"

# Run webhook server
aof-triggers --config triggers.yaml

# Check logs
journalctl -u aof-agent -f

# Health check
curl http://localhost:8080/health

# Metrics
curl http://localhost:8080/metrics

Environment Variables:

ANTHROPIC_API_KEY        # Anthropic API key
OPENAI_API_KEY          # OpenAI API key
RUST_LOG                # Logging level
AOF_MEMORY_BACKEND      # Memory backend (memory, redis, sled, file)
REDIS_URL               # Redis connection string
AOF_WEBHOOK_PORT        # Webhook server port

Default Ports:

8080 - Main application / webhook server
8081 - Metrics endpoint
6379 - Redis
9090 - Prometheus
3000 - Grafana

Support Resources

Documentation: https://github.com/your-org/aof/docs
Issues: https://github.com/your-org/aof/issues
Discord: https://discord.gg/aof
Email: support@aof.dev

Last Updated: 2024-12-10 Version: 1.0.0

Table of Contents​

1. Prerequisites​

System Requirements​

Dependencies​

API Keys Required​

2. Deployment Options​

Option A: Standalone Binary​

Option B: Docker Container​

Option C: Kubernetes Deployment​

Option D: Desktop App Distribution​

3. Platform Setup​

WhatsApp Business API Setup​

Telegram Bot Creation​

Slack App Configuration​

Discord Bot Setup​

4. Configuration​

Environment Variables​

YAML Configuration​

Secrets Management​

MCP Server Setup​

5. Security​

Webhook Signature Verification​

Rate Limiting​

API Key Rotation​

Network Policies​

6. Monitoring​

Logging Configuration​

Metrics Collection​

Alerting Setup​

Health Checks​

7. Scaling​

Horizontal Scaling​

Load Balancing​

Database Configuration​

Caching Strategies​

8. Troubleshooting​

Common Issues​

Debug Mode​

Log Analysis​

Performance Tuning​

Appendix​

Quick Reference​

Support Resources​

Table of Contents

1. Prerequisites

System Requirements

Dependencies

API Keys Required

2. Deployment Options

Option A: Standalone Binary

Option B: Docker Container

Option C: Kubernetes Deployment

Option D: Desktop App Distribution

3. Platform Setup

WhatsApp Business API Setup

Telegram Bot Creation

Slack App Configuration

Discord Bot Setup

4. Configuration

Environment Variables

YAML Configuration

Secrets Management

MCP Server Setup

5. Security

Webhook Signature Verification

Rate Limiting

API Key Rotation

Network Policies

6. Monitoring

Logging Configuration

Metrics Collection

Alerting Setup

Health Checks

7. Scaling

Horizontal Scaling

Load Balancing

Database Configuration

Caching Strategies

8. Troubleshooting

Common Issues

Debug Mode

Log Analysis

Performance Tuning

Appendix

Quick Reference

Support Resources