Like AOF? Give us a star!

If you find AOF useful, please star us on GitHub. It helps us reach more developers and grow the community.

AgentFleet: Multi-Agent Coordination

AgentFleet enables multiple AI agents to work together on complex tasks. Think of it like a Kubernetes Deployment - multiple specialized pods working in parallel toward a common goal.

Quick Reference: Coordination Modes

Mode	Description	Use Case
`peer`	All agents parallel + consensus OR aggregation	Code review, multi-expert analysis
`hierarchical`	Manager coordinates workers	Complex orchestration
`pipeline`	Sequential handoff	Data transformation
`swarm`	Self-organizing, load balanced	High-volume parallel
`tiered`	Tier-based parallel	Multi-model RCA
`deep`	Iterative planning + execution	Complex investigations

Fleet Patterns for Agentic Ops

Choose the right fleet pattern for your use case:

Use Case	Fleet Pattern	Why
Code Review	Peer + Aggregation	Multiple specialists (security, quality) provide complementary findings
Root Cause Analysis	Tiered or Deep	Data collectors → Reasoners → Synthesis
Incident Response	Hierarchical	Coordinator delegates to specialists
Multi-Model Validation	Peer + Consensus	Multiple LLMs validate same hypothesis
Data Pipeline	Pipeline	Sequential transformation stages
High-Volume Processing	Swarm	Self-organizing load balancing
Complex Investigation	Deep	Iterative planning, execution, re-planning
Change Approval	Peer + Consensus	Voting/unanimous agreement
Parallel Data Collection	Peer + Aggregation	Logs + Metrics + Traces agents

Quick Decision Guide

Need multiple perspectives on SAME task?
  └── Yes: Use Peer + Consensus (pick best)
  └── No: Different specialists?
        └── Yes: Use Peer + Aggregation (collect all)
        └── No: Sequential processing?
              └── Yes: Use Pipeline
              └── No: Complex orchestration?
                    └── Yes: Use Hierarchical
                    └── No: Iterative investigation?
                          └── Yes: Use Deep
                          └── No: High-volume?
                                └── Yes: Use Swarm
                                └── No: Multi-tier RCA?
                                      └── Yes: Use Tiered

Agent-First Architecture

AOF uses a simple, composable model - Agent for 95% of tasks, Fleet when sophisticated reasoning is needed:

Concept	What It Is	Example
Agent	Single-purpose specialist	`k8s-agent`, `prometheus-agent`
Fleet	Team of agents for a purpose	`devops-fleet`, `rca-fleet`
Flow	Event routing to fleets	Telegram → Fleet → Response

The key insight: Don't build "super agents" with many tools. Build focused agents, then compose them into fleets.

# ✅ GOOD: Fleet composes single-purpose agents
apiVersion: aof.dev/v1alpha1
kind: AgentFleet
metadata:
  name: devops-fleet
spec:
  agents:
    - ref: library/k8s-agent.yaml      # kubectl, helm only
    - ref: library/docker-agent.yaml   # docker only
    - ref: library/git-agent.yaml      # git only

# ❌ BAD: One agent with too many tools
spec:
  tools: [kubectl, docker, git, terraform, aws, helm]
  # Hard to maintain, not reusable, unfocused

Why Use Fleets?

The Single Agent Problem

A single AI agent, even a powerful one like Claude or GPT-4, has limitations:

Single perspective: One model, one viewpoint
Blind spots: May miss domain-specific issues
Hallucination risk: No cross-validation
Single point of failure: If it's wrong, you're wrong

The Fleet Solution

Fleets solve these problems through specialization and consensus:

┌─────────────────────────────────────────────────────────────┐
│                    SINGLE AGENT                             │
│  ┌─────────────────────────────────────────────────────┐    │
│  │  Generalist Agent                                   │    │
│  │  - Knows security (somewhat)                        │    │
│  │  - Knows performance (somewhat)                     │    │
│  │  - Knows style (somewhat)                           │    │
│  │  - Single point of failure                          │    │
│  └─────────────────────────────────────────────────────┘    │
└─────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────┐
│                    AGENT FLEET                              │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐         │
│  │ Security    │  │ Performance │  │ Quality     │         │
│  │ Specialist  │  │ Specialist  │  │ Specialist  │         │
│  │ (focused)   │  │ (focused)   │  │ (focused)   │         │
│  └──────┬──────┘  └──────┬──────┘  └──────┬──────┘         │
│         └────────────────┼────────────────┘                 │
│                          ▼                                  │
│                   ┌─────────────┐                           │
│                   │  CONSENSUS  │                           │
│                   └─────────────┘                           │
│                          ▼                                  │
│              Unified, validated result                      │
└─────────────────────────────────────────────────────────────┘

When to Use Fleet vs Single Agent

Use a Single Agent When:

Task is straightforward (answer questions, write simple code)
You need conversational memory across turns
Cost is a primary concern
Latency is critical (sub-second responses)

Use a Fleet When:

Scenario	Why Fleet Wins
Code Review	3 specialists catch more than 1 generalist
Incident Response	Parallel analysis of logs, metrics, traces
Critical Decisions	Consensus reduces hallucination risk
Cost Optimization	Cheap models in parallel vs expensive single model
Compliance/Audit	"3 agents agreed on this decision"
Specialization	Focused instructions → better results

Cost Comparison

Single Claude Opus:
- 1 call × $15/1M tokens
- 1 perspective
- ~30s response time

Fleet of 3 Gemini Flash:
- 3 parallel calls × $0.075/1M tokens
- 3 perspectives + consensus
- ~5s response time (parallel execution!)

Coordination Modes

AOF supports six coordination modes for different use cases:

1. Peer Mode (Default)

All agents work as equals, executing in parallel. Results can be:

Consensus (default): Pick the best result when agents compete on the same task
Aggregation: Merge ALL results when agents provide complementary information

Peer Mode with Consensus (Competing Perspectives)

Use consensus when multiple agents tackle the same problem and you want to pick the best answer:

coordination:
  mode: peer
  distribution: round-robin
  consensus:
    algorithm: majority
    min_votes: 2

Best for: Multiple RCA hypotheses, voting scenarios, redundancy

Peer Mode with Aggregation (Complementary Specialists)

Use aggregation when specialists provide different but complementary information:

coordination:
  mode: peer
  distribution: round-robin
  aggregation: merge  # Collect ALL agent results

Best for: Code review (security + quality), multi-expert analysis, parallel data collection

Aggregation Options:

merge - Collect and combine all agent results into a structured output
consensus - Fall back to consensus (default)
manager_synthesis - Use a manager to synthesize (requires hierarchical mode)

How it works:

Task submitted to all agents simultaneously
Each agent executes independently (in parallel)
Results collected from all agents
Either consensus picks one OR aggregation merges all

     ┌──────────┐
     │   Task   │
     └────┬─────┘
          │
    ┌─────┼─────┐
    ▼     ▼     ▼
┌──────┐┌──────┐┌──────┐
│Agent1││Agent2││Agent3│  (parallel)
└──┬───┘└──┬───┘└──┬───┘
   │       │       │
   └───────┼───────┘
           ▼
     ┌──────────────┐
     │ Consensus OR │
     │ Aggregation  │
     └──────────────┘

When to Use Which:

Scenario	Use	Example
Same task, pick best	`consensus`	3 models diagnose same issue
Different specialists, need all	`aggregation: merge`	Security + Quality reviewers
Voting/approval	`consensus`	3 agents approve deployment
Parallel data collection	`aggregation: merge`	Logs + Metrics + Traces agents

2. Hierarchical Mode

A manager agent coordinates worker agents, delegating tasks and synthesizing results.

coordination:
  mode: hierarchical

agents:
  - name: coordinator
    role: manager
    spec:
      instructions: |
        You coordinate the team. Analyze tasks and delegate to specialists.
        Synthesize their findings into a final report.

  - name: log-analyzer
    role: worker
    spec:
      instructions: Focus only on log analysis.

  - name: metrics-analyzer
    role: worker
    spec:
      instructions: Focus only on metrics analysis.

Best for: Complex orchestration, incident response, multi-stage workflows

How it works:

Manager agent receives task
Manager analyzes and decides delegation strategy
Workers execute their assigned portions
Manager synthesizes final result

          ┌──────────┐
          │  Task    │
          └────┬─────┘
               ▼
         ┌──────────┐
         │ Manager  │
         └────┬─────┘
              │ delegates
    ┌─────────┼─────────┐
    ▼         ▼         ▼
┌────────┐┌────────┐┌────────┐
│Worker 1││Worker 2││Worker 3│
└───┬────┘└───┬────┘└───┬────┘
    │         │         │
    └─────────┼─────────┘
              ▼ results
         ┌──────────┐
         │ Manager  │
         │synthesize│
         └──────────┘

3. Pipeline Mode

Agents execute sequentially, with each agent's output becoming the next agent's input.

coordination:
  mode: pipeline

agents:
  - name: data-collector
    spec:
      instructions: Collect and format raw data.

  - name: analyzer
    spec:
      instructions: Analyze the collected data.

  - name: reporter
    spec:
      instructions: Generate a human-readable report.

Best for: Data transformation, multi-stage processing, ETL workflows

How it works:

First agent processes original input
Output passed to next agent as input
Continues through all agents sequentially
Final agent produces the result

┌──────────┐    ┌──────────┐    ┌──────────┐
│ Agent 1  │───▶│ Agent 2  │───▶│ Agent 3  │
│(collect) │    │(analyze) │    │(report)  │
└──────────┘    └──────────┘    └──────────┘
     input ──▶ intermediate ──▶ output

4. Swarm Mode

Self-organizing, dynamic coordination with intelligent load balancing.

coordination:
  mode: swarm
  distribution: least-loaded

Best for: High-volume task processing, load-balanced workloads

How it works:

Task arrives
System selects least-loaded idle agent
Agent processes task
Metrics tracked for future balancing

5. Tiered Mode (Multi-Model RCA)

Tier-based parallel execution with consensus at each tier. Designed for multi-model scenarios like Root Cause Analysis where cheap data collectors feed reasoning models.

coordination:
  mode: tiered
  consensus:
    algorithm: weighted
    min_confidence: 0.6
  tiered:
    pass_all_results: true
    final_aggregation: manager_synthesis

agents:
  # Tier 1: Data Collectors (cheap models, parallel)
  - name: loki-collector
    tier: 1
    spec:
      model: google:gemini-2.0-flash  # ~$0.075/1M tokens

  - name: prometheus-collector
    tier: 1
    spec:
      model: google:gemini-2.0-flash

  # Tier 2: Reasoning Models (multi-model consensus)
  - name: claude-analyzer
    tier: 2
    weight: 1.5  # Higher weight for Claude
    spec:
      model: anthropic:claude-sonnet-4-20250514

  - name: gemini-analyzer
    tier: 2
    weight: 1.0
    spec:
      model: google:gemini-2.5-pro

  # Tier 3: Coordinator (synthesis)
  - name: rca-coordinator
    tier: 3
    role: manager
    spec:
      model: anthropic:claude-sonnet-4-20250514

Best for: Multi-model RCA, complex analysis workflows, cost-optimized multi-perspective analysis

How it works:

Tier 1 agents execute in parallel (cheap data collection)
Results from Tier 1 are passed to Tier 2 agents
Tier 2 agents analyze with different LLMs (multi-model consensus)
Tier 3 manager synthesizes final result

        ┌─────────────────────────────────────────────────┐
        │              TIERED EXECUTION                    │
        │                                                  │
        │  TIER 1: Data Collectors (parallel)             │
        │  ┌────────┐ ┌────────┐ ┌────────┐ ┌────────┐    │
        │  │ Loki   │ │ Prom   │ │  K8s   │ │  Git   │    │
        │  │ Logs   │ │Metrics │ │ State  │ │Changes │    │
        │  └───┬────┘ └───┬────┘ └───┬────┘ └───┬────┘    │
        │      └──────────┼──────────┼──────────┘         │
        │                 ▼ (collected data)              │
        │                                                  │
        │  TIER 2: Reasoning Models (multi-model)         │
        │  ┌────────────┐ ┌────────────┐ ┌────────────┐   │
        │  │  Claude    │ │  Gemini    │ │   GPT-4    │   │
        │  │ (wt: 1.5)  │ │ (wt: 1.0)  │ │ (wt: 1.0)  │   │
        │  └─────┬──────┘ └─────┬──────┘ └─────┬──────┘   │
        │        └──────────────┼──────────────┘          │
        │                       ▼ (weighted consensus)    │
        │                                                  │
        │  TIER 3: Coordinator                            │
        │  ┌────────────────────────────────────────┐     │
        │  │           RCA Coordinator              │     │
        │  │    (synthesize final report)           │     │
        │  └────────────────────────────────────────┘     │
        │                       ▼                          │
        │               Final RCA Report                   │
        └─────────────────────────────────────────────────┘

Tiered Configuration Options:

tiered:
  # Pass all results to next tier (vs just consensus winner)
  pass_all_results: true

  # Final tier aggregation strategy
  final_aggregation: manager_synthesis  # or: consensus, merge

  # Per-tier consensus configuration
  tier_consensus:
    "1":
      algorithm: first_wins  # Fast data collection
    "2":
      algorithm: weighted    # Multi-model consensus
      min_votes: 2

6. Deep Mode (Iterative Planning + Execution)

Deep mode adds an agentic loop pattern: planning → execution → re-planning. Unlike other modes that execute once and return, deep mode iterates until the goal is achieved or max iterations reached.

coordination:
  mode: deep
  deep:
    max_iterations: 10
    planning: true
    memory: true

Best for: Complex investigations, root cause analysis, multi-step reasoning tasks

What Deep Mode Adds:

Planning - LLM generates investigation steps before execution
Iteration - Execute steps until goal achieved (not just once)
Re-planning - Adjust plan based on findings mid-execution
Memory - Persist findings across iterations for context

How it works:

User: /fleet rca "why is API returning 500 errors"
         │
         ▼
┌─────────────────────────────────────────────────────────────┐
│  DEEP MODE = Agentic Loop (like Claude Code)                │
│                                                              │
│  1. PLAN: Generate investigation steps                      │
│     → Check pods, fetch logs, query metrics, correlate      │
│                                                              │
│  2. EXECUTE: Run each step using appropriate tools          │
│     → kubectl get pods → kubectl logs → promql query        │
│                                                              │
│  3. ITERATE: Continue until goal achieved                   │
│     → Found OOM? Check memory limits. Still unclear? Dig.   │
│                                                              │
│  4. SYNTHESIZE: Produce final answer with evidence          │
│     → Root cause + evidence + recommendations               │
└─────────────────────────────────────────────────────────────┘

Example Output:

🔍 RCA Fleet - Investigating...

📋 Plan:
1. Check pod status
2. Fetch error logs
3. Query metrics
4. Correlate findings

⏳ Step 1/4: Checking pods...
   → 2/3 pods in CrashLoopBackOff

⏳ Step 2/4: Fetching logs...
   → OutOfMemoryError at 14:32

⏳ Step 3/4: Querying metrics...
   → Memory at 98% before crash

⏳ Step 4/4: Correlating...

✅ Root Cause: Memory leak in API service

Evidence:
• Pods crashing with OOMKilled
• Memory at 98% before crash
• 3x traffic spike at 14:30

Recommendations:
1. Increase memory limits (immediate)
2. Profile for memory leak (long-term)

Deep Configuration Options:

coordination:
  mode: deep
  deep:
    # Safety limit - stop after N iterations
    max_iterations: 10

    # Enable planning phase (LLM generates steps)
    planning: true

    # Persist findings in memory across iterations
    memory: true

    # Optional: Override model for planning
    planner_model: anthropic:claude-sonnet-4-20250514

    # Optional: Custom planning prompt
    planner_prompt: |
      You are planning an investigation. Generate steps to find the root cause.

    # Optional: Custom synthesis prompt
    synthesizer_prompt: |
      Synthesize findings into a clear report with evidence.

When to Use Deep vs Other Modes:

Scenario	Recommended Mode
Multiple perspectives on same task	Peer + Consensus
Manager delegates to workers	Hierarchical
Sequential data transformation	Pipeline
High-volume parallel processing	Swarm
Multi-model analysis (cheap → expensive)	Tiered
Complex investigation needing iteration	Deep

Consensus Algorithms

When using Peer or Tiered modes, consensus determines how agent results are aggregated.

Supported Algorithms

Algorithm	Rule	Use Case
majority	>50% must agree	Code review, incident triage
unanimous	100% must agree	Critical deployments, security
weighted	Votes weighted by role	Senior > Junior reviewers
first_wins	First response wins	Time-critical scenarios
human_review	Always flag for human	High-stakes decisions

Algorithm Details

Majority

More than 50% of agents must agree on the result. Fast and tolerant of outliers.

Unanimous

100% of agents must agree. Use for critical decisions where false positives are costly.

Weighted

Each agent has a configurable weight. Senior reviewers can count more than juniors.

agents:
  - name: senior-reviewer
    weight: 2.0  # Counts as 2 votes

  - name: junior-reviewer
    weight: 1.0  # Counts as 1 vote

consensus:
  algorithm: weighted
  weights:
    senior-reviewer: 2.0
    junior-reviewer: 1.0

FirstWins

First agent to respond wins. Use when speed matters more than consensus.

HumanReview

Always flags for human operator decision. Use for high-stakes scenarios.

consensus:
  algorithm: human_review
  min_confidence: 0.9  # If confidence below this, definitely needs review

Configuration

coordination:
  mode: peer
  consensus:
    algorithm: majority       # majority, unanimous, weighted, first_wins, human_review
    min_votes: 2              # Minimum responses required
    timeout_ms: 60000         # Max wait time (60 seconds)
    allow_partial: true       # Accept result if some agents fail
    min_confidence: 0.7       # Below this, flag for human review
    weights:                  # Per-agent weights (for weighted algorithm)
      senior-reviewer: 2.0
      junior-reviewer: 1.0

Example: Code Review Consensus

Security Agent:   "CRITICAL: SQL injection on line 42"
Performance Agent: "No SQL issues, but N+1 query problem"
Quality Agent:    "SQL injection on line 42, missing tests"

Consensus (majority):
  - SQL injection CONFIRMED (2/3 agree)
  - N+1 query flagged (1/3, noted but not critical)
  - Missing tests flagged (1/3, noted)

Task Distribution Strategies

How tasks are assigned to agents (for Hierarchical and Swarm modes):

Strategy	Description	Best For
round-robin	Cycle through agents	Even distribution
least-loaded	Agent with fewest tasks	Load balancing
random	Random selection	Simple scenarios
skill-based	Match agent skills to task	Specialized work
sticky	Same task type → same agent	Caching benefits

coordination:
  mode: hierarchical
  distribution: least-loaded  # or round-robin, random, skill-based, sticky

Agent Roles

Agents can have defined roles that affect their behavior in the fleet:

Role	Description	Used In
worker	Regular task executor	All modes
manager	Coordinator/orchestrator	Hierarchical mode
specialist	Domain expert	Any mode
validator	Review/validation	Quality gates

agents:
  - name: security-expert
    role: specialist
    spec:
      instructions: You are a security specialist...

  - name: team-lead
    role: manager
    spec:
      instructions: You coordinate the team...

Communication Patterns

Agents can communicate through shared memory and messaging:

Shared Memory Types

Type	Description	Use Case
in_memory	RAM-based	Single process, testing
redis	Distributed cache	Multi-instance, real-time
sqlite	Local database	Persistent, single node
postgres	Distributed DB	Production, multi-node

Message Patterns

Pattern	Description	Use Case
direct	Point-to-point	Agent-to-agent
broadcast	All agents receive	Announcements
pub_sub	Topic-based	Event-driven
request_reply	Request-response	Queries

communication:
  pattern: broadcast
  broadcast:
    channel: team-updates
    include_sender: false

shared:
  memory:
    type: redis
    url: redis://localhost:6379

Real-World Examples

Example 1: Code Review Fleet (with Aggregation)

Two specialists review code in parallel. Uses aggregation to collect ALL findings (not consensus):

apiVersion: aof.dev/v1
kind: AgentFleet
metadata:
  name: code-review-fleet
spec:
  agents:
    - name: security-reviewer
      role: specialist
      spec:
        model: google:gemini-2.5-flash
        instructions: |
          Focus on security: SQL injection, XSS, authentication,
          secrets in code, dependency vulnerabilities.
          Format: ## Security Review with Critical/High/Medium sections.

    - name: quality-reviewer
      role: specialist
      spec:
        model: google:gemini-2.5-flash
        instructions: |
          Focus on quality: SOLID principles, error handling,
          code structure, naming conventions.
          Format: ## Quality Review with Issues/Suggestions/Score.

  coordination:
    mode: peer
    distribution: round-robin
    aggregation: merge  # Collect ALL findings from both specialists

Run it:

aofctl run fleet code-review-fleet.yaml \
  --input "Review: function login(user, pass) { const query = 'SELECT * FROM users WHERE name=' + user; return db.query(query); }"

Output (both reviews merged):

╔═══════════════════════════════════════════════════════════════════════════╗
║  FLEET RESULTS (2 agents)                                                 ║
╚═══════════════════════════════════════════════════════════════════════════╝

┌─ security-reviewer
│  ## Security Review
│  ### Critical Issues
│  * SQL Injection: Direct concatenation of user input...
└

┌─ quality-reviewer
│  ## Quality Review
│  ### Issues Found
│  1. Unused parameter 'pass'
│  2. No error handling
│  ### Score: 2/10
└

Example 1b: Code Review Fleet (with Consensus)

Same task but with consensus - picks the most agreed-upon findings:

coordination:
  mode: peer
  distribution: round-robin
  consensus:
    algorithm: majority
    min_votes: 2

Use consensus when: Multiple agents analyze the same aspect and you want to validate findings (e.g., 3 security reviewers must agree on vulnerabilities).

Example 2: Incident Response Fleet

Hierarchical coordination with a manager orchestrating specialists:

apiVersion: aof.dev/v1
kind: AgentFleet
metadata:
  name: incident-response-team
spec:
  agents:
    - name: incident-commander
      role: manager
      spec:
        model: google:gemini-2.5-flash
        instructions: |
          You are the Incident Commander. When an incident arrives:
          1. Assess severity and impact
          2. Delegate investigation to specialists
          3. Coordinate response actions
          4. Synthesize findings into an incident report
        tools:
          - shell

    - name: log-investigator
      role: specialist
      spec:
        model: google:gemini-2.5-flash
        instructions: |
          You analyze logs. Search for errors, exceptions, and anomalies.
          Report timeline of events and root cause indicators.
        tools:
          - shell
          - read_file

    - name: metrics-analyst
      role: specialist
      spec:
        model: google:gemini-2.5-flash
        instructions: |
          You analyze metrics and dashboards. Look for:
          - Resource exhaustion (CPU, memory, disk)
          - Traffic anomalies
          - Error rate spikes
        tools:
          - shell

  coordination:
    mode: hierarchical
    distribution: skill-based

Example 3: Data Pipeline Fleet

Sequential processing with each stage building on the previous:

apiVersion: aof.dev/v1
kind: AgentFleet
metadata:
  name: data-pipeline
spec:
  agents:
    - name: collector
      spec:
        model: google:gemini-2.5-flash
        instructions: |
          Collect and normalize raw data from the input source.
          Output clean, structured JSON.

    - name: analyzer
      spec:
        model: google:gemini-2.5-flash
        instructions: |
          Analyze the structured data. Identify patterns, anomalies,
          and key insights. Output analysis summary.

    - name: reporter
      spec:
        model: google:gemini-2.5-flash
        instructions: |
          Generate a human-readable report from the analysis.
          Include executive summary, key findings, and recommendations.

  coordination:
    mode: pipeline

Fleet Metrics

Fleets automatically track execution metrics:

metrics:
  total_tasks: 150
  completed_tasks: 145
  failed_tasks: 5
  avg_task_duration_ms: 2340
  active_agents: 3
  total_agents: 3
  consensus_rounds: 145  # For peer mode

Access via CLI:

aofctl describe fleet code-review-team

Token Usage Tracking

Fleet execution automatically tracks token usage for each agent and provides aggregated totals. This is useful for cost monitoring and optimization.

Token Usage in Output

After fleet execution, token usage is displayed in the completion summary:

╭─────────────────────────────────────────────────────────────╮
│ 🚀 FLEET EXECUTION COMPLETE                                  │
├─────────────────────────────────────────────────────────────┤
│  Fleet: code-review-fleet                                   │
│  Duration: 5.23s                                            │
├─────────────────────────────────────────────────────────────┤
│  Token Usage:                                                │
│    Input:          1,234 tokens                              │
│    Output:           567 tokens                              │
│    Total:          1,801 tokens                              │
╰─────────────────────────────────────────────────────────────╯

Token Usage in JSON Output

When using --output json, token usage is included in the result:

{
  "results": [...],
  "agent_count": 2,
  "usage": {
    "input_tokens": 1234,
    "output_tokens": 567,
    "total_tokens": 1801
  }
}

Per-Agent Token Tracking

Each agent's token usage is tracked individually and included in the result when using aggregation: merge:

{
  "results": [
    {
      "agent": "security-reviewer",
      "response": "...",
      "input_tokens": 612,
      "output_tokens": 284
    },
    {
      "agent": "quality-reviewer",
      "response": "...",
      "input_tokens": 622,
      "output_tokens": 283
    }
  ],
  "usage": {
    "input_tokens": 1234,
    "output_tokens": 567,
    "total_tokens": 1801
  }
}

Cost Estimation

Use token counts to estimate costs based on your model's pricing:

# Gemini 2.5 Flash pricing (~$0.075/1M input, ~$0.30/1M output)
# 1,234 input + 567 output ≈ $0.00026

# Claude Sonnet pricing (~$3/1M input, ~$15/1M output)
# Same tokens ≈ $0.012

Best Practices

1. Choose the Right Mode

Situation	Recommended Mode
Need multiple perspectives	Peer + Consensus
Complex orchestration	Hierarchical
Sequential processing	Pipeline
High-volume, uniform tasks	Swarm

2. Optimize Agent Instructions

Each agent should have focused, specific instructions:

# ❌ Bad: Too generic
instructions: Review the code for issues.

# ✅ Good: Focused and specific
instructions: |
  You are a SECURITY specialist. Focus ONLY on:
  - SQL injection vulnerabilities
  - XSS attack vectors
  - Authentication/authorization flaws
  - Secrets or credentials in code
  - Insecure dependencies

  Ignore performance and style issues - other agents handle those.

3. Use Appropriate Consensus

Scenario	Algorithm
General review	majority
Security-critical	unanimous
Mixed seniority	weighted
Time-critical	first_wins

4. Set Reasonable Timeouts

consensus:
  timeout_ms: 60000      # 60 seconds for most tasks
  allow_partial: true    # Don't fail if one agent is slow

5. Use Replicas for Scaling

agents:
  - name: worker
    replicas: 3  # 3 instances for load balancing

Summary

Aspect	Single Agent	Fleet
Perspectives	1	Multiple
Reliability	Single point of failure	Consensus-validated
Cost	Can be expensive	Often cheaper (parallel cheap models)
Latency	Sequential	Parallel execution
Specialization	Generalist	Deep expertise per agent
Audit Trail	Limited	Full consensus history

Rule of thumb: Use fleets for anything critical, multi-perspective, or high-volume. Use single agents for simple, conversational, or cost-sensitive tasks.

Next Steps:

Quick Reference: Coordination Modes​

Fleet Patterns for Agentic Ops​

Quick Decision Guide​

Agent-First Architecture​

Why Use Fleets?​

The Single Agent Problem​

The Fleet Solution​

When to Use Fleet vs Single Agent​

Use a Single Agent When:​

Use a Fleet When:​

Cost Comparison​

Coordination Modes​

1. Peer Mode (Default)​

Peer Mode with Consensus (Competing Perspectives)​

Peer Mode with Aggregation (Complementary Specialists)​

2. Hierarchical Mode​

3. Pipeline Mode​

4. Swarm Mode​

5. Tiered Mode (Multi-Model RCA)​

6. Deep Mode (Iterative Planning + Execution)​

Consensus Algorithms​

Supported Algorithms​

Algorithm Details​

Majority​

Unanimous​

Weighted​

FirstWins​

HumanReview​

Configuration​

Example: Code Review Consensus​

Task Distribution Strategies​

Agent Roles​

Communication Patterns​

Shared Memory Types​

Message Patterns​

Real-World Examples​

Example 1: Code Review Fleet (with Aggregation)​

Example 1b: Code Review Fleet (with Consensus)​

Example 2: Incident Response Fleet​

Example 3: Data Pipeline Fleet​

Fleet Metrics​

Token Usage Tracking​

Token Usage in Output​

Token Usage in JSON Output​

Per-Agent Token Tracking​

Cost Estimation​

Best Practices​

1. Choose the Right Mode​

2. Optimize Agent Instructions​

3. Use Appropriate Consensus​

4. Set Reasonable Timeouts​

5. Use Replicas for Scaling​

Summary​

Quick Reference: Coordination Modes

Fleet Patterns for Agentic Ops

Quick Decision Guide

Agent-First Architecture

Why Use Fleets?

The Single Agent Problem

The Fleet Solution

When to Use Fleet vs Single Agent

Use a Single Agent When:

Use a Fleet When:

Cost Comparison

Coordination Modes

1. Peer Mode (Default)

Peer Mode with Consensus (Competing Perspectives)

Peer Mode with Aggregation (Complementary Specialists)

2. Hierarchical Mode

3. Pipeline Mode

4. Swarm Mode

5. Tiered Mode (Multi-Model RCA)

6. Deep Mode (Iterative Planning + Execution)

Consensus Algorithms

Supported Algorithms

Algorithm Details

Majority

Unanimous

Weighted

FirstWins

HumanReview

Configuration

Example: Code Review Consensus

Task Distribution Strategies

Agent Roles

Communication Patterns

Shared Memory Types

Message Patterns

Real-World Examples

Example 1: Code Review Fleet (with Aggregation)

Example 1b: Code Review Fleet (with Consensus)

Example 2: Incident Response Fleet

Example 3: Data Pipeline Fleet

Fleet Metrics

Token Usage Tracking

Token Usage in Output

Token Usage in JSON Output

Per-Agent Token Tracking

Cost Estimation

Best Practices

1. Choose the Right Mode

2. Optimize Agent Instructions

3. Use Appropriate Consensus

4. Set Reasonable Timeouts

5. Use Replicas for Scaling

Summary