Like AOF? Give us a star!

If you find AOF useful, please star us on GitHub. It helps us reach more developers and grow the community.

Deep Analysis Fleet Tutorial

Learn how to build an agentic investigation system using AOF's deep coordination mode. This pattern enables iterative, self-directed analysis where a planner agent creates investigation plans and worker agents execute them until the root cause is found.

What You'll Build

A multi-agent fleet that:

Uses deep coordination for iterative investigation
Integrates MCP servers for filesystem and GitHub access
Combines code analysis and infrastructure analysis
Synthesizes findings into actionable conclusions

Prerequisites

AOF installed (curl -sSL https://docs.aof.sh/install.sh | bash)
Google AI API key (or other supported provider)
Node.js (for MCP servers)
kubectl configured (for infrastructure analysis)
Optional: GitHub token, Prometheus/Loki endpoints

Understanding Deep Coordination

Traditional coordination modes (hierarchical, peer, pipeline) have fixed execution patterns. Deep coordination is an agentic pattern where:

Planner analyzes the problem and creates an investigation plan
Workers execute investigation steps and report findings
Planner reviews findings and refines the plan
Loop continues until confidence threshold is met
Synthesizer combines all findings into conclusions

┌─────────────────────────────────────────────────────────────────┐
│                    Deep Coordination Flow                        │
├─────────────────────────────────────────────────────────────────┤
│                                                                   │
│  ┌──────────┐     ┌───────────────┐     ┌──────────────────┐    │
│  │  Task    │────▶│   Planner     │────▶│  Investigation   │    │
│  │  Input   │     │  (Manager)    │     │     Plan         │    │
│  └──────────┘     └───────────────┘     └────────┬─────────┘    │
│                          ▲                        │              │
│                          │                        ▼              │
│                   ┌──────┴──────┐     ┌──────────────────┐      │
│                   │   Refine    │     │  Worker Agents   │      │
│                   │    Plan     │     │  Execute Steps   │      │
│                   └──────┬──────┘     └────────┬─────────┘      │
│                          │                      │                │
│                          │         ┌────────────┘                │
│                          │         ▼                             │
│                   ┌──────┴──────────────┐                        │
│                   │     Findings        │                        │
│                   │   Confidence < 0.8  │───────┐                │
│                   └─────────────────────┘       │                │
│                          │                      │                │
│                          │ Confidence ≥ 0.8    │ Loop           │
│                          ▼                      │                │
│                   ┌─────────────────────┐       │                │
│                   │    Synthesizer      │◀──────┘                │
│                   │  Final Conclusions  │                        │
│                   └─────────────────────┘                        │
└─────────────────────────────────────────────────────────────────┘

Step 1: Create the Fleet Configuration

Create deep-analysis-fleet.yaml:

apiVersion: aof.dev/v1
kind: AgentFleet
metadata:
  name: deep-analysis-fleet
  labels:
    purpose: investigation
    pattern: deep-agentic

spec:
  agents:
    # Planner - creates and refines investigation plans
    - name: planner
      role: manager
      spec:
        model: google:gemini-2.5-flash
        instructions: |
          You are a senior investigation planner. Your role is to:

          1. Analyze the problem statement and current findings
          2. Create a structured investigation plan with specific steps
          3. Prioritize steps based on likelihood of finding root cause
          4. Refine the plan based on worker agent findings

          Output investigation plans in this format:

INVESTIGATION PLAN:

[Step description] - [Expected outcome]
[Step description] - [Expected outcome]

PRIORITY: [high|medium|low] HYPOTHESIS: [Current working hypothesis]

tools:
- shell
max_iterations: 5
temperature: 0.3

# Code analyzer with MCP filesystem access
- name: code-analyzer
role: specialist
spec:
model: google:gemini-2.5-flash
instructions: |
You are a code analysis specialist. Examine source code
for bugs, misconfigurations, and recent changes that might
cause the reported issue.

Always provide specific file paths and line numbers.
tools:
- git
- shell
mcp_servers:
- name: filesystem
  transport: stdio
  command: npx
  args: ["-y", "@modelcontextprotocol/server-filesystem", "/workspace"]
max_iterations: 8
temperature: 0.2

# Infrastructure analyzer
- name: infra-analyzer
role: specialist
spec:
model: google:gemini-2.5-flash
instructions: |
You are an infrastructure analysis specialist. Query
Kubernetes, logs, and metrics to identify infrastructure
issues causing the reported problem.
tools:
- kubectl
- prometheus_query
- loki_query
max_iterations: 8
temperature: 0.2

# Synthesizer - combines findings
- name: synthesizer
role: validator
spec:
model: google:gemini-2.5-flash
instructions: |
You are an analysis synthesizer. Review all findings,
identify patterns, determine root cause with confidence
level, and recommend remediation steps.
tools:
- shell
max_iterations: 5
temperature: 0.3

coordination:
mode: deep
manager: planner
deep:
max_iterations: 5
confidence_threshold: 0.8
context_strategy: cumulative
investigation_agents:
- code-analyzer
- infra-analyzer
synthesis_agent: synthesizer
distribution: capability_based

Step 2: Add MCP Server Integration

The code-analyzer agent uses MCP servers for enhanced capabilities:

mcp_servers:
  # Filesystem access for reading code
  - name: filesystem
    transport: stdio
    command: npx
    args: ["-y", "@modelcontextprotocol/server-filesystem", "/workspace"]

  # GitHub access for PR history, issues, etc.
  - name: github
    transport: stdio
    command: npx
    args: ["-y", "@modelcontextprotocol/server-github"]
    env:
      GITHUB_TOKEN: "${GITHUB_TOKEN}"

Install the MCP servers:

# These are installed on-demand by npx, but you can pre-install:
npm install -g @modelcontextprotocol/server-filesystem
npm install -g @modelcontextprotocol/server-github

Step 3: Configure Environment

Set up required environment variables:

# Required
export GOOGLE_API_KEY="your-google-api-key"

# For infrastructure analysis
export KUBECONFIG="${HOME}/.kube/config"
export PROMETHEUS_URL="http://prometheus:9090"
export LOKI_URL="http://loki:3100"

# For GitHub MCP server (optional)
export GITHUB_TOKEN="ghp_your_github_token"

Step 4: Run the Fleet

Execute the fleet with an investigation task:

# Basic usage
aofctl run fleet deep-analysis-fleet.yaml \
  --task "Analyze why the payment service is returning 500 errors"

# With verbose output
aofctl run fleet deep-analysis-fleet.yaml \
  --task "Investigate high latency in the checkout API" \
  --verbose

# With custom working directory for MCP
aofctl run fleet deep-analysis-fleet.yaml \
  --task "Find the cause of memory leaks in user-service" \
  --env WORKSPACE=/path/to/your/repo

Step 5: Understanding the Output

The fleet produces structured output at each iteration:

Iteration 1: Initial Plan

=== DEEP ANALYSIS: Iteration 1 ===
[Planner] Creating initial investigation plan...

INVESTIGATION PLAN:
1. Check payment-service pod logs for error patterns
2. Query Prometheus for error rate and latency metrics
3. Review recent deployments and config changes
4. Examine payment-service source for error handling

PRIORITY: high
HYPOTHESIS: None yet - gathering initial data

Iteration 2: First Findings

=== DEEP ANALYSIS: Iteration 2 ===
[code-analyzer] Analyzing recent changes...
  Finding: PR #234 modified payment validation logic 2 days ago
  Finding: New timeout configuration in config/payment.yaml

[infra-analyzer] Checking infrastructure...
  Finding: payment-service pods showing increased memory usage
  Finding: 503 errors correlate with memory pressure

[Planner] Refining plan based on findings...

INVESTIGATION PLAN:
1. Deep dive into PR #234 changes
2. Check memory limits vs actual usage
3. Trace payment validation code path

HYPOTHESIS: Memory pressure causing request failures
CONFIDENCE: 0.6

Final Synthesis

=== DEEP ANALYSIS: Complete ===
[Synthesizer] Combining findings...

FINDINGS SUMMARY:
- PR #234 introduced unbounded cache in PaymentValidator
- Memory usage increased 300% after deployment
- Pods hitting memory limits, triggering OOMKills
- Requests failing during garbage collection pauses

ROOT CAUSE: Memory leak in PaymentValidator cache
CONFIDENCE: high (0.92)

EVIDENCE:
- Memory growth pattern matches cache accumulation
- Errors correlate with GC pause times
- Rolling restart temporarily resolves issue

REMEDIATION:
1. Immediate: Increase memory limits to 2Gi
2. Short-term: Add cache eviction policy
3. Follow-up: Review PR #234, add cache size monitoring

PREVENTION:
- Add memory usage alerts
- Include memory profiling in CI
- Require cache size limits in code review

Advanced Configuration

Custom Investigation Agents

Add domain-specific specialists:

agents:
  # Database specialist
  - name: db-analyzer
    role: specialist
    spec:
      model: google:gemini-2.5-flash
      instructions: |
        You are a database specialist. Analyze query performance,
        connection pools, and data consistency issues.
      mcp_servers:
        - name: postgres
          transport: stdio
          command: npx
          args: ["-y", "@modelcontextprotocol/server-postgres"]
          env:
            DATABASE_URL: "${DATABASE_URL}"

Confidence Tuning

Adjust when the investigation concludes:

coordination:
  deep:
    # Higher threshold = more thorough investigation
    confidence_threshold: 0.9

    # More iterations allowed
    max_iterations: 10

    # How context is passed between iterations
    # - cumulative: All findings accumulate
    # - windowed: Only last N findings
    # - summarized: LLM summarizes findings
    context_strategy: cumulative

Multi-Model Investigation

Use different models for different roles:

agents:
  - name: planner
    spec:
      model: google:gemini-2.5-flash  # Fast for planning
      # ...

  - name: code-analyzer
    spec:
      model: anthropic:claude-3-5-sonnet  # Strong at code
      # ...

  - name: synthesizer
    spec:
      model: openai:gpt-4o  # Good at synthesis
      # ...

Best Practices

1. Clear Investigation Scope

Provide specific, actionable tasks:

# Good - specific and actionable
aofctl run fleet deep-analysis-fleet.yaml \
  --task "Find why checkout API returns 504 timeout errors after 10pm UTC"

# Too vague
aofctl run fleet deep-analysis-fleet.yaml \
  --task "Fix the API"

2. Appropriate Tool Access

Give agents only the tools they need:

# Code analyzer needs file access, not kubectl
- name: code-analyzer
  spec:
    tools: [git, shell]
    mcp_servers: [filesystem]

# Infra analyzer needs kubectl, not file access
- name: infra-analyzer
  spec:
    tools: [kubectl, prometheus_query]

3. Resource Management

Set appropriate limits for long investigations:

agents:
  - name: code-analyzer
    spec:
      max_iterations: 8  # Limit per-agent iterations
      temperature: 0.2   # Lower = more focused

coordination:
  deep:
    max_iterations: 5    # Limit total planning cycles

4. Secure MCP Configuration

Use environment variables for sensitive data:

mcp_servers:
  - name: github
    env:
      GITHUB_TOKEN: "${GITHUB_TOKEN}"  # Never hardcode tokens

Troubleshooting

MCP Server Won't Start

# Verify npx works
npx -y @modelcontextprotocol/server-filesystem --version

# Check Node.js version (requires 18+)
node --version

Investigation Not Converging

Lower confidence_threshold (e.g., 0.7)
Increase max_iterations
Check if task is too broad - narrow the scope

Missing Findings

Verify tool permissions (kubectl access, file paths)
Check environment variables are set
Review agent logs for errors

Complete Example

See the full example at:

Fleet config: examples/fleets/deep-analysis-fleet.yaml
Agent configs: examples/agents/library/

Next Steps

Fleet Reference - Complete fleet specification
MCP Integration - Advanced MCP configuration
Tools Reference - Available built-in tools

What You'll Build​

Prerequisites​

Understanding Deep Coordination​

Step 1: Create the Fleet Configuration​

Step 2: Add MCP Server Integration​

Step 3: Configure Environment​

Step 4: Run the Fleet​

Step 5: Understanding the Output​

Iteration 1: Initial Plan​

Iteration 2: First Findings​

Final Synthesis​

Advanced Configuration​

Custom Investigation Agents​

Confidence Tuning​

Multi-Model Investigation​

Best Practices​

1. Clear Investigation Scope​

2. Appropriate Tool Access​

3. Resource Management​

4. Secure MCP Configuration​

Troubleshooting​

MCP Server Won't Start​

Investigation Not Converging​

Missing Findings​

Complete Example​

Next Steps​