Skip to main content
Like AOF? Give us a star!
If you find AOF useful, please star us on GitHub. It helps us reach more developers and grow the community.

AgentFleet: Multi-Agent Coordination

AgentFleet enables multiple AI agents to work together on complex tasks. Think of it like a Kubernetes Deployment - multiple specialized pods working in parallel toward a common goal.

Quick Reference: Coordination Modes

ModeDescriptionUse Case
peerAll agents parallel + consensusCode review, voting
hierarchicalManager coordinates workersComplex orchestration
pipelineSequential handoffData transformation
swarmSelf-organizing, load balancedHigh-volume parallel
tieredTier-based parallelMulti-model RCA
deepNEW Iterative planning + executionComplex investigations

Agent-First Architecture

AOF uses a simple, composable model - Agent for 95% of tasks, Fleet when sophisticated reasoning is needed:

ConceptWhat It IsExample
AgentSingle-purpose specialistk8s-agent, prometheus-agent
FleetTeam of agents for a purposedevops-fleet, rca-fleet
FlowEvent routing to fleetsTelegram → Fleet → Response

The key insight: Don't build "super agents" with many tools. Build focused agents, then compose them into fleets.

# ✅ GOOD: Fleet composes single-purpose agents
apiVersion: aof.dev/v1alpha1
kind: AgentFleet
metadata:
name: devops-fleet
spec:
agents:
- ref: library/k8s-agent.yaml # kubectl, helm only
- ref: library/docker-agent.yaml # docker only
- ref: library/git-agent.yaml # git only
# ❌ BAD: One agent with too many tools
spec:
tools: [kubectl, docker, git, terraform, aws, helm]
# Hard to maintain, not reusable, unfocused

Why Use Fleets?

The Single Agent Problem

A single AI agent, even a powerful one like Claude or GPT-4, has limitations:

  • Single perspective: One model, one viewpoint
  • Blind spots: May miss domain-specific issues
  • Hallucination risk: No cross-validation
  • Single point of failure: If it's wrong, you're wrong

The Fleet Solution

Fleets solve these problems through specialization and consensus:

┌─────────────────────────────────────────────────────────────┐
│ SINGLE AGENT │
│ ┌─────────────────────────────────────────────────────┐ │
│ │ Generalist Agent │ │
│ │ - Knows security (somewhat) │ │
│ │ - Knows performance (somewhat) │ │
│ │ - Knows style (somewhat) │ │
│ │ - Single point of failure │ │
│ └─────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────┐
│ AGENT FLEET │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ Security │ │ Performance │ │ Quality │ │
│ │ Specialist │ │ Specialist │ │ Specialist │ │
│ │ (focused) │ │ (focused) │ │ (focused) │ │
│ └──────┬──────┘ └──────┬──────┘ └──────┬──────┘ │
│ └────────────────┼────────────────┘ │
│ ▼ │
│ ┌─────────────┐ │
│ │ CONSENSUS │ │
│ └─────────────┘ │
│ ▼ │
│ Unified, validated result │
└─────────────────────────────────────────────────────────────┘

When to Use Fleet vs Single Agent

Use a Single Agent When:

  • Task is straightforward (answer questions, write simple code)
  • You need conversational memory across turns
  • Cost is a primary concern
  • Latency is critical (sub-second responses)

Use a Fleet When:

ScenarioWhy Fleet Wins
Code Review3 specialists catch more than 1 generalist
Incident ResponseParallel analysis of logs, metrics, traces
Critical DecisionsConsensus reduces hallucination risk
Cost OptimizationCheap models in parallel vs expensive single model
Compliance/Audit"3 agents agreed on this decision"
SpecializationFocused instructions → better results

Cost Comparison

Single Claude Opus:
- 1 call × $15/1M tokens
- 1 perspective
- ~30s response time

Fleet of 3 Gemini Flash:
- 3 parallel calls × $0.075/1M tokens
- 3 perspectives + consensus
- ~5s response time (parallel execution!)

Coordination Modes

AOF supports six coordination modes for different use cases:

1. Peer Mode (Default)

All agents work as equals, executing in parallel with results aggregated via consensus.

coordination:
mode: peer
distribution: round-robin
consensus:
algorithm: majority
min_votes: 2

Best for: Code review, multi-perspective analysis, voting scenarios

How it works:

  1. Task submitted to all agents simultaneously
  2. Each agent executes independently (in parallel)
  3. Results collected from all agents
  4. Consensus algorithm determines final result
     ┌──────────┐
│ Task │
└────┬─────┘

┌─────┼─────┐
▼ ▼ ▼
┌──────┐┌──────┐┌──────┐
│Agent1││Agent2││Agent3│ (parallel)
└──┬───┘└──┬───┘└──┬───┘
│ │ │
└───────┼───────┘

┌──────────┐
│Consensus │
└──────────┘

2. Hierarchical Mode

A manager agent coordinates worker agents, delegating tasks and synthesizing results.

coordination:
mode: hierarchical

agents:
- name: coordinator
role: manager
spec:
instructions: |
You coordinate the team. Analyze tasks and delegate to specialists.
Synthesize their findings into a final report.

- name: log-analyzer
role: worker
spec:
instructions: Focus only on log analysis.

- name: metrics-analyzer
role: worker
spec:
instructions: Focus only on metrics analysis.

Best for: Complex orchestration, incident response, multi-stage workflows

How it works:

  1. Manager agent receives task
  2. Manager analyzes and decides delegation strategy
  3. Workers execute their assigned portions
  4. Manager synthesizes final result
          ┌──────────┐
│ Task │
└────┬─────┘

┌──────────┐
│ Manager │
└────┬─────┘
│ delegates
┌─────────┼─────────┐
▼ ▼ ▼
┌────────┐┌────────┐┌────────┐
│Worker 1││Worker 2││Worker 3│
└───┬────┘└───┬────┘└───┬────┘
│ │ │
└─────────┼─────────┘
▼ results
┌──────────┐
│ Manager │
│synthesize│
└──────────┘

3. Pipeline Mode

Agents execute sequentially, with each agent's output becoming the next agent's input.

coordination:
mode: pipeline

agents:
- name: data-collector
spec:
instructions: Collect and format raw data.

- name: analyzer
spec:
instructions: Analyze the collected data.

- name: reporter
spec:
instructions: Generate a human-readable report.

Best for: Data transformation, multi-stage processing, ETL workflows

How it works:

  1. First agent processes original input
  2. Output passed to next agent as input
  3. Continues through all agents sequentially
  4. Final agent produces the result
┌──────────┐    ┌──────────┐    ┌──────────┐
│ Agent 1 │───▶│ Agent 2 │───▶│ Agent 3 │
│(collect) │ │(analyze) │ │(report) │
└──────────┘ └──────────┘ └──────────┘
input ──▶ intermediate ──▶ output

4. Swarm Mode

Self-organizing, dynamic coordination with intelligent load balancing.

coordination:
mode: swarm
distribution: least-loaded

Best for: High-volume task processing, load-balanced workloads

How it works:

  1. Task arrives
  2. System selects least-loaded idle agent
  3. Agent processes task
  4. Metrics tracked for future balancing

5. Tiered Mode (Multi-Model RCA)

Tier-based parallel execution with consensus at each tier. Designed for multi-model scenarios like Root Cause Analysis where cheap data collectors feed reasoning models.

coordination:
mode: tiered
consensus:
algorithm: weighted
min_confidence: 0.6
tiered:
pass_all_results: true
final_aggregation: manager_synthesis

agents:
# Tier 1: Data Collectors (cheap models, parallel)
- name: loki-collector
tier: 1
spec:
model: google:gemini-2.0-flash # ~$0.075/1M tokens

- name: prometheus-collector
tier: 1
spec:
model: google:gemini-2.0-flash

# Tier 2: Reasoning Models (multi-model consensus)
- name: claude-analyzer
tier: 2
weight: 1.5 # Higher weight for Claude
spec:
model: anthropic:claude-sonnet-4-20250514

- name: gemini-analyzer
tier: 2
weight: 1.0
spec:
model: google:gemini-2.5-pro

# Tier 3: Coordinator (synthesis)
- name: rca-coordinator
tier: 3
role: manager
spec:
model: anthropic:claude-sonnet-4-20250514

Best for: Multi-model RCA, complex analysis workflows, cost-optimized multi-perspective analysis

How it works:

  1. Tier 1 agents execute in parallel (cheap data collection)
  2. Results from Tier 1 are passed to Tier 2 agents
  3. Tier 2 agents analyze with different LLMs (multi-model consensus)
  4. Tier 3 manager synthesizes final result
        ┌─────────────────────────────────────────────────┐
│ TIERED EXECUTION │
│ │
│ TIER 1: Data Collectors (parallel) │
│ ┌────────┐ ┌────────┐ ┌────────┐ ┌────────┐ │
│ │ Loki │ │ Prom │ │ K8s │ │ Git │ │
│ │ Logs │ │Metrics │ │ State │ │Changes │ │
│ └───┬────┘ └───┬────┘ └───┬────┘ └───┬────┘ │
│ └──────────┼──────────┼──────────┘ │
│ ▼ (collected data) │
│ │
│ TIER 2: Reasoning Models (multi-model) │
│ ┌────────────┐ ┌────────────┐ ┌────────────┐ │
│ │ Claude │ │ Gemini │ │ GPT-4 │ │
│ │ (wt: 1.5) │ │ (wt: 1.0) │ │ (wt: 1.0) │ │
│ └─────┬──────┘ └─────┬──────┘ └─────┬──────┘ │
│ └──────────────┼──────────────┘ │
│ ▼ (weighted consensus) │
│ │
│ TIER 3: Coordinator │
│ ┌────────────────────────────────────────┐ │
│ │ RCA Coordinator │ │
│ │ (synthesize final report) │ │
│ └────────────────────────────────────────┘ │
│ ▼ │
│ Final RCA Report │
└─────────────────────────────────────────────────┘

Tiered Configuration Options:

tiered:
# Pass all results to next tier (vs just consensus winner)
pass_all_results: true

# Final tier aggregation strategy
final_aggregation: manager_synthesis # or: consensus, merge

# Per-tier consensus configuration
tier_consensus:
"1":
algorithm: first_wins # Fast data collection
"2":
algorithm: weighted # Multi-model consensus
min_votes: 2

6. Deep Mode (Iterative Planning + Execution)

Deep mode adds an agentic loop pattern: planning → execution → re-planning. Unlike other modes that execute once and return, deep mode iterates until the goal is achieved or max iterations reached.

coordination:
mode: deep
deep:
max_iterations: 10
planning: true
memory: true

Best for: Complex investigations, root cause analysis, multi-step reasoning tasks

What Deep Mode Adds:

  1. Planning - LLM generates investigation steps before execution
  2. Iteration - Execute steps until goal achieved (not just once)
  3. Re-planning - Adjust plan based on findings mid-execution
  4. Memory - Persist findings across iterations for context

How it works:

User: /fleet rca "why is API returning 500 errors"


┌─────────────────────────────────────────────────────────────┐
│ DEEP MODE = Agentic Loop (like Claude Code) │
│ │
│ 1. PLAN: Generate investigation steps │
│ → Check pods, fetch logs, query metrics, correlate │
│ │
│ 2. EXECUTE: Run each step using appropriate tools │
│ → kubectl get pods → kubectl logs → promql query │
│ │
│ 3. ITERATE: Continue until goal achieved │
│ → Found OOM? Check memory limits. Still unclear? Dig. │
│ │
│ 4. SYNTHESIZE: Produce final answer with evidence │
│ → Root cause + evidence + recommendations │
└─────────────────────────────────────────────────────────────┘

Example Output:

🔍 RCA Fleet - Investigating...

📋 Plan:
1. Check pod status
2. Fetch error logs
3. Query metrics
4. Correlate findings

⏳ Step 1/4: Checking pods...
→ 2/3 pods in CrashLoopBackOff

⏳ Step 2/4: Fetching logs...
→ OutOfMemoryError at 14:32

⏳ Step 3/4: Querying metrics...
→ Memory at 98% before crash

⏳ Step 4/4: Correlating...

✅ Root Cause: Memory leak in API service

Evidence:
• Pods crashing with OOMKilled
• Memory at 98% before crash
• 3x traffic spike at 14:30

Recommendations:
1. Increase memory limits (immediate)
2. Profile for memory leak (long-term)

Deep Configuration Options:

coordination:
mode: deep
deep:
# Safety limit - stop after N iterations
max_iterations: 10

# Enable planning phase (LLM generates steps)
planning: true

# Persist findings in memory across iterations
memory: true

# Optional: Override model for planning
planner_model: anthropic:claude-sonnet-4-20250514

# Optional: Custom planning prompt
planner_prompt: |
You are planning an investigation. Generate steps to find the root cause.

# Optional: Custom synthesis prompt
synthesizer_prompt: |
Synthesize findings into a clear report with evidence.

When to Use Deep vs Other Modes:

ScenarioRecommended Mode
Multiple perspectives on same taskPeer + Consensus
Manager delegates to workersHierarchical
Sequential data transformationPipeline
High-volume parallel processingSwarm
Multi-model analysis (cheap → expensive)Tiered
Complex investigation needing iterationDeep

Consensus Algorithms

When using Peer or Tiered modes, consensus determines how agent results are aggregated.

Supported Algorithms

AlgorithmRuleUse Case
majority>50% must agreeCode review, incident triage
unanimous100% must agreeCritical deployments, security
weightedVotes weighted by roleSenior > Junior reviewers
first_winsFirst response winsTime-critical scenarios
human_reviewAlways flag for humanHigh-stakes decisions

Algorithm Details

Majority

More than 50% of agents must agree on the result. Fast and tolerant of outliers.

Unanimous

100% of agents must agree. Use for critical decisions where false positives are costly.

Weighted

Each agent has a configurable weight. Senior reviewers can count more than juniors.

agents:
- name: senior-reviewer
weight: 2.0 # Counts as 2 votes

- name: junior-reviewer
weight: 1.0 # Counts as 1 vote

consensus:
algorithm: weighted
weights:
senior-reviewer: 2.0
junior-reviewer: 1.0

FirstWins

First agent to respond wins. Use when speed matters more than consensus.

HumanReview

Always flags for human operator decision. Use for high-stakes scenarios.

consensus:
algorithm: human_review
min_confidence: 0.9 # If confidence below this, definitely needs review

Configuration

coordination:
mode: peer
consensus:
algorithm: majority # majority, unanimous, weighted, first_wins, human_review
min_votes: 2 # Minimum responses required
timeout_ms: 60000 # Max wait time (60 seconds)
allow_partial: true # Accept result if some agents fail
min_confidence: 0.7 # Below this, flag for human review
weights: # Per-agent weights (for weighted algorithm)
senior-reviewer: 2.0
junior-reviewer: 1.0

Example: Code Review Consensus

Security Agent:   "CRITICAL: SQL injection on line 42"
Performance Agent: "No SQL issues, but N+1 query problem"
Quality Agent: "SQL injection on line 42, missing tests"

Consensus (majority):
- SQL injection CONFIRMED (2/3 agree)
- N+1 query flagged (1/3, noted but not critical)
- Missing tests flagged (1/3, noted)

Task Distribution Strategies

How tasks are assigned to agents (for Hierarchical and Swarm modes):

StrategyDescriptionBest For
round-robinCycle through agentsEven distribution
least-loadedAgent with fewest tasksLoad balancing
randomRandom selectionSimple scenarios
skill-basedMatch agent skills to taskSpecialized work
stickySame task type → same agentCaching benefits
coordination:
mode: hierarchical
distribution: least-loaded # or round-robin, random, skill-based, sticky

Agent Roles

Agents can have defined roles that affect their behavior in the fleet:

RoleDescriptionUsed In
workerRegular task executorAll modes
managerCoordinator/orchestratorHierarchical mode
specialistDomain expertAny mode
validatorReview/validationQuality gates
agents:
- name: security-expert
role: specialist
spec:
instructions: You are a security specialist...

- name: team-lead
role: manager
spec:
instructions: You coordinate the team...

Communication Patterns

Agents can communicate through shared memory and messaging:

Shared Memory Types

TypeDescriptionUse Case
in_memoryRAM-basedSingle process, testing
redisDistributed cacheMulti-instance, real-time
sqliteLocal databasePersistent, single node
postgresDistributed DBProduction, multi-node

Message Patterns

PatternDescriptionUse Case
directPoint-to-pointAgent-to-agent
broadcastAll agents receiveAnnouncements
pub_subTopic-basedEvent-driven
request_replyRequest-responseQueries
communication:
pattern: broadcast
broadcast:
channel: team-updates
include_sender: false

shared:
memory:
type: redis
url: redis://localhost:6379

Real-World Examples

Example 1: Code Review Fleet

Three specialists review code in parallel, with majority consensus:

apiVersion: aof.dev/v1
kind: AgentFleet
metadata:
name: code-review-team
spec:
agents:
- name: security-reviewer
role: specialist
spec:
model: google:gemini-2.5-flash
instructions: |
Focus on security: SQL injection, XSS, authentication,
secrets in code, dependency vulnerabilities.
tools:
- read_file
- git

- name: performance-reviewer
role: specialist
spec:
model: google:gemini-2.5-flash
instructions: |
Focus on performance: Big-O complexity, memory leaks,
N+1 queries, caching opportunities.
tools:
- read_file
- git

- name: quality-reviewer
role: specialist
spec:
model: google:gemini-2.5-flash
instructions: |
Focus on quality: SOLID principles, error handling,
test coverage, documentation.
tools:
- read_file
- git

coordination:
mode: peer
distribution: round-robin
consensus:
algorithm: majority
min_votes: 2
timeout_ms: 60000

Run it:

aofctl run fleet code-review-team.yaml \
--input "Review this PR: https://github.com/org/repo/pull/123"

Example 2: Incident Response Fleet

Hierarchical coordination with a manager orchestrating specialists:

apiVersion: aof.dev/v1
kind: AgentFleet
metadata:
name: incident-response-team
spec:
agents:
- name: incident-commander
role: manager
spec:
model: google:gemini-2.5-flash
instructions: |
You are the Incident Commander. When an incident arrives:
1. Assess severity and impact
2. Delegate investigation to specialists
3. Coordinate response actions
4. Synthesize findings into an incident report
tools:
- shell

- name: log-investigator
role: specialist
spec:
model: google:gemini-2.5-flash
instructions: |
You analyze logs. Search for errors, exceptions, and anomalies.
Report timeline of events and root cause indicators.
tools:
- shell
- read_file

- name: metrics-analyst
role: specialist
spec:
model: google:gemini-2.5-flash
instructions: |
You analyze metrics and dashboards. Look for:
- Resource exhaustion (CPU, memory, disk)
- Traffic anomalies
- Error rate spikes
tools:
- shell

coordination:
mode: hierarchical
distribution: skill-based

Example 3: Data Pipeline Fleet

Sequential processing with each stage building on the previous:

apiVersion: aof.dev/v1
kind: AgentFleet
metadata:
name: data-pipeline
spec:
agents:
- name: collector
spec:
model: google:gemini-2.5-flash
instructions: |
Collect and normalize raw data from the input source.
Output clean, structured JSON.

- name: analyzer
spec:
model: google:gemini-2.5-flash
instructions: |
Analyze the structured data. Identify patterns, anomalies,
and key insights. Output analysis summary.

- name: reporter
spec:
model: google:gemini-2.5-flash
instructions: |
Generate a human-readable report from the analysis.
Include executive summary, key findings, and recommendations.

coordination:
mode: pipeline

Fleet Metrics

Fleets automatically track execution metrics:

metrics:
total_tasks: 150
completed_tasks: 145
failed_tasks: 5
avg_task_duration_ms: 2340
active_agents: 3
total_agents: 3
consensus_rounds: 145 # For peer mode

Access via CLI:

aofctl describe fleet code-review-team

Best Practices

1. Choose the Right Mode

SituationRecommended Mode
Need multiple perspectivesPeer + Consensus
Complex orchestrationHierarchical
Sequential processingPipeline
High-volume, uniform tasksSwarm

2. Optimize Agent Instructions

Each agent should have focused, specific instructions:

# ❌ Bad: Too generic
instructions: Review the code for issues.

# ✅ Good: Focused and specific
instructions: |
You are a SECURITY specialist. Focus ONLY on:
- SQL injection vulnerabilities
- XSS attack vectors
- Authentication/authorization flaws
- Secrets or credentials in code
- Insecure dependencies

Ignore performance and style issues - other agents handle those.

3. Use Appropriate Consensus

ScenarioAlgorithm
General reviewmajority
Security-criticalunanimous
Mixed seniorityweighted
Time-criticalfirst_wins

4. Set Reasonable Timeouts

consensus:
timeout_ms: 60000 # 60 seconds for most tasks
allow_partial: true # Don't fail if one agent is slow

5. Use Replicas for Scaling

agents:
- name: worker
replicas: 3 # 3 instances for load balancing

Summary

AspectSingle AgentFleet
Perspectives1Multiple
ReliabilitySingle point of failureConsensus-validated
CostCan be expensiveOften cheaper (parallel cheap models)
LatencySequentialParallel execution
SpecializationGeneralistDeep expertise per agent
Audit TrailLimitedFull consensus history

Rule of thumb: Use fleets for anything critical, multi-perspective, or high-volume. Use single agents for simple, conversational, or cost-sensitive tasks.


Next Steps: