AgentFleet: Multi-Agent Coordination
AgentFleet enables multiple AI agents to work together on complex tasks. Think of it like a Kubernetes Deployment - multiple specialized pods working in parallel toward a common goal.
Quick Reference: Coordination Modes
| Mode | Description | Use Case |
|---|---|---|
peer | All agents parallel + consensus OR aggregation | Code review, multi-expert analysis |
hierarchical | Manager coordinates workers | Complex orchestration |
pipeline | Sequential handoff | Data transformation |
swarm | Self-organizing, load balanced | High-volume parallel |
tiered | Tier-based parallel | Multi-model RCA |
deep | Iterative planning + execution | Complex investigations |
Fleet Patterns for Agentic Ops
Choose the right fleet pattern for your use case:
| Use Case | Fleet Pattern | Why |
|---|---|---|
| Code Review | Peer + Aggregation | Multiple specialists (security, quality) provide complementary findings |
| Root Cause Analysis | Tiered or Deep | Data collectors → Reasoners → Synthesis |
| Incident Response | Hierarchical | Coordinator delegates to specialists |
| Multi-Model Validation | Peer + Consensus | Multiple LLMs validate same hypothesis |
| Data Pipeline | Pipeline | Sequential transformation stages |
| High-Volume Processing | Swarm | Self-organizing load balancing |
| Complex Investigation | Deep | Iterative planning, execution, re-planning |
| Change Approval | Peer + Consensus | Voting/unanimous agreement |
| Parallel Data Collection | Peer + Aggregation | Logs + Metrics + Traces agents |
Quick Decision Guide
Need multiple perspectives on SAME task?
└── Yes: Use Peer + Consensus (pick best)
└── No: Different specialists?
└── Yes: Use Peer + Aggregation (collect all)
└── No: Sequential processing?
└── Yes: Use Pipeline
└── No: Complex orchestration?
└── Yes: Use Hierarchical
└── No: Iterative investigation?
└── Yes: Use Deep
└── No: High-volume?
└── Yes: Use Swarm
└── No: Multi-tier RCA?
└── Yes: Use Tiered
Agent-First Architecture
AOF uses a simple, composable model - Agent for 95% of tasks, Fleet when sophisticated reasoning is needed:
| Concept | What It Is | Example |
|---|---|---|
| Agent | Single-purpose specialist | k8s-agent, prometheus-agent |
| Fleet | Team of agents for a purpose | devops-fleet, rca-fleet |
| Flow | Event routing to fleets | Telegram → Fleet → Response |
The key insight: Don't build "super agents" with many tools. Build focused agents, then compose them into fleets.
# ✅ GOOD: Fleet composes single-purpose agents
apiVersion: aof.dev/v1alpha1
kind: AgentFleet
metadata:
name: devops-fleet
spec:
agents:
- ref: library/k8s-agent.yaml # kubectl, helm only
- ref: library/docker-agent.yaml # docker only
- ref: library/git-agent.yaml # git only
# ❌ BAD: One agent with too many tools
spec:
tools: [kubectl, docker, git, terraform, aws, helm]
# Hard to maintain, not reusable, unfocused
Why Use Fleets?
The Single Agent Problem
A single AI agent, even a powerful one like Claude or GPT-4, has limitations:
- Single perspective: One model, one viewpoint
- Blind spots: May miss domain-specific issues
- Hallucination risk: No cross-validation
- Single point of failure: If it's wrong, you're wrong
The Fleet Solution
Fleets solve these problems through specialization and consensus:
┌─────────────────────────────────────────────────────────────┐
│ SINGLE AGENT │
│ ┌─────────────────────────────────────────────────────┐ │
│ │ Generalist Agent │ │
│ │ - Knows security (somewhat) │ │
│ │ - Knows performance (somewhat) │ │
│ │ - Knows style (somewhat) │ │
│ │ - Single point of failure │ │
│ └─────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│ AGENT FLEET │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ Security │ │ Performance │ │ Quality │ │
│ │ Specialist │ │ Specialist │ │ Specialist │ │
│ │ (focused) │ │ (focused) │ │ (focused) │ │
│ └──────┬──────┘ └──────┬──────┘ └──────┬──────┘ │
│ └────────────────┼────────────────┘ │
│ ▼ │
│ ┌─────────────┐ │
│ │ CONSENSUS │ │
│ └─────────────┘ │
│ ▼ │
│ Unified, validated result │
└─────────────────────────────────────────────────────────────┘
When to Use Fleet vs Single Agent
Use a Single Agent When:
- Task is straightforward (answer questions, write simple code)
- You need conversational memory across turns
- Cost is a primary concern
- Latency is critical (sub-second responses)
Use a Fleet When:
| Scenario | Why Fleet Wins |
|---|---|
| Code Review | 3 specialists catch more than 1 generalist |
| Incident Response | Parallel analysis of logs, metrics, traces |
| Critical Decisions | Consensus reduces hallucination risk |
| Cost Optimization | Cheap models in parallel vs expensive single model |
| Compliance/Audit | "3 agents agreed on this decision" |
| Specialization | Focused instructions → better results |
Cost Comparison
Single Claude Opus:
- 1 call × $15/1M tokens
- 1 perspective
- ~30s response time
Fleet of 3 Gemini Flash:
- 3 parallel calls × $0.075/1M tokens
- 3 perspectives + consensus
- ~5s response time (parallel execution!)
Coordination Modes
AOF supports six coordination modes for different use cases:
1. Peer Mode (Default)
All agents work as equals, executing in parallel. Results can be:
- Consensus (default): Pick the best result when agents compete on the same task
- Aggregation: Merge ALL results when agents provide complementary information
Peer Mode with Consensus (Competing Perspectives)
Use consensus when multiple agents tackle the same problem and you want to pick the best answer:
coordination:
mode: peer
distribution: round-robin
consensus:
algorithm: majority
min_votes: 2
Best for: Multiple RCA hypotheses, voting scenarios, redundancy
Peer Mode with Aggregation (Complementary Specialists)
Use aggregation when specialists provide different but complementary information:
coordination:
mode: peer
distribution: round-robin
aggregation: merge # Collect ALL agent results
Best for: Code review (security + quality), multi-expert analysis, parallel data collection
Aggregation Options:
merge- Collect and combine all agent results into a structured outputconsensus- Fall back to consensus (default)manager_synthesis- Use a manager to synthesize (requires hierarchical mode)
How it works:
- Task submitted to all agents simultaneously
- Each agent executes independently (in parallel)
- Results collected from all agents
- Either consensus picks one OR aggregation merges all
┌──────────┐
│ Task │
└────┬─────┘
│
┌─────┼─────┐
▼ ▼ ▼
┌──────┐┌──────┐┌──────┐
│Agent1││Agent2││Agent3│ (parallel)
└──┬───┘└──┬───┘└──┬───┘
│ │ │
└───────┼───────┘
▼
┌──────────────┐
│ Consensus OR │
│ Aggregation │
└──────────────┘
When to Use Which:
| Scenario | Use | Example |
|---|---|---|
| Same task, pick best | consensus | 3 models diagnose same issue |
| Different specialists, need all | aggregation: merge | Security + Quality reviewers |
| Voting/approval | consensus | 3 agents approve deployment |
| Parallel data collection | aggregation: merge | Logs + Metrics + Traces agents |
2. Hierarchical Mode
A manager agent coordinates worker agents, delegating tasks and synthesizing results.
coordination:
mode: hierarchical
agents:
- name: coordinator
role: manager
spec:
instructions: |
You coordinate the team. Analyze tasks and delegate to specialists.
Synthesize their findings into a final report.
- name: log-analyzer
role: worker
spec:
instructions: Focus only on log analysis.
- name: metrics-analyzer
role: worker
spec:
instructions: Focus only on metrics analysis.
Best for: Complex orchestration, incident response, multi-stage workflows
How it works:
- Manager agent receives task
- Manager analyzes and decides delegation strategy
- Workers execute their assigned portions
- Manager synthesizes final result
┌──────────┐
│ Task │
└────┬─────┘
▼
┌──────────┐
│ Manager │
└────┬─────┘
│ delegates
┌─────────┼─────────┐
▼ ▼ ▼
┌────────┐┌────────┐┌────────┐
│Worker 1││Worker 2││Worker 3│
└───┬────┘└───┬────┘└───┬────┘
│ │ │
└─────────┼─────────┘
▼ results
┌──────────┐
│ Manager │
│synthesize│
└──────────┘
3. Pipeline Mode
Agents execute sequentially, with each agent's output becoming the next agent's input.
coordination:
mode: pipeline
agents:
- name: data-collector
spec:
instructions: Collect and format raw data.
- name: analyzer
spec:
instructions: Analyze the collected data.
- name: reporter
spec:
instructions: Generate a human-readable report.
Best for: Data transformation, multi-stage processing, ETL workflows
How it works:
- First agent processes original input
- Output passed to next agent as input
- Continues through all agents sequentially
- Final agent produces the result
┌──────────┐ ┌──────────┐ ┌──────────┐
│ Agent 1 │───▶│ Agent 2 │───▶│ Agent 3 │
│(collect) │ │(analyze) │ │(report) │
└──────────┘ └──────────┘ └──────────┘
input ──▶ intermediate ──▶ output
4. Swarm Mode
Self-organizing, dynamic coordination with intelligent load balancing.
coordination:
mode: swarm
distribution: least-loaded
Best for: High-volume task processing, load-balanced workloads
How it works:
- Task arrives
- System selects least-loaded idle agent
- Agent processes task
- Metrics tracked for future balancing
5. Tiered Mode (Multi-Model RCA)
Tier-based parallel execution with consensus at each tier. Designed for multi-model scenarios like Root Cause Analysis where cheap data collectors feed reasoning models.
coordination:
mode: tiered
consensus:
algorithm: weighted
min_confidence: 0.6
tiered:
pass_all_results: true
final_aggregation: manager_synthesis
agents:
# Tier 1: Data Collectors (cheap models, parallel)
- name: loki-collector
tier: 1
spec:
model: google:gemini-2.0-flash # ~$0.075/1M tokens
- name: prometheus-collector
tier: 1
spec:
model: google:gemini-2.0-flash
# Tier 2: Reasoning Models (multi-model consensus)
- name: claude-analyzer
tier: 2
weight: 1.5 # Higher weight for Claude
spec:
model: anthropic:claude-sonnet-4-20250514
- name: gemini-analyzer
tier: 2
weight: 1.0
spec:
model: google:gemini-2.5-pro
# Tier 3: Coordinator (synthesis)
- name: rca-coordinator
tier: 3
role: manager
spec:
model: anthropic:claude-sonnet-4-20250514
Best for: Multi-model RCA, complex analysis workflows, cost-optimized multi-perspective analysis
How it works:
- Tier 1 agents execute in parallel (cheap data collection)
- Results from Tier 1 are passed to Tier 2 agents
- Tier 2 agents analyze with different LLMs (multi-model consensus)
- Tier 3 manager synthesizes final result
┌─────────────────────────────────────────────────┐
│ TIERED EXECUTION │
│ │
│ TIER 1: Data Collectors (parallel) │
│ ┌────────┐ ┌────────┐ ┌────────┐ ┌────────┐ │
│ │ Loki │ │ Prom │ │ K8s │ │ Git │ │
│ │ Logs │ │Metrics │ │ State │ │Changes │ │
│ └───┬────┘ └───┬────┘ └───┬────┘ └───┬────┘ │
│ └──────────┼──────────┼──────────┘ │
│ ▼ (collected data) │
│ │
│ TIER 2: Reasoning Models (multi-model) │
│ ┌────────────┐ ┌────────────┐ ┌────────────┐ │
│ │ Claude │ │ Gemini │ │ GPT-4 │ │
│ │ (wt: 1.5) │ │ (wt: 1.0) │ │ (wt: 1.0) │ │
│ └─────┬──────┘ └─────┬──────┘ └─────┬──────┘ │
│ └──────────────┼──────────────┘ │
│ ▼ (weighted consensus) │
│ │
│ TIER 3: Coordinator │
│ ┌────────────────────────────────────────┐ │
│ │ RCA Coordinator │ │
│ │ (synthesize final report) │ │
│ └────────────────────────────────────────┘ │
│ ▼ │
│ Final RCA Report │
└─────────────────────────────────────────────────┘
Tiered Configuration Options:
tiered:
# Pass all results to next tier (vs just consensus winner)
pass_all_results: true
# Final tier aggregation strategy
final_aggregation: manager_synthesis # or: consensus, merge
# Per-tier consensus configuration
tier_consensus:
"1":
algorithm: first_wins # Fast data collection
"2":
algorithm: weighted # Multi-model consensus
min_votes: 2
6. Deep Mode (Iterative Planning + Execution)
Deep mode adds an agentic loop pattern: planning → execution → re-planning. Unlike other modes that execute once and return, deep mode iterates until the goal is achieved or max iterations reached.
coordination:
mode: deep
deep:
max_iterations: 10
planning: true
memory: true
Best for: Complex investigations, root cause analysis, multi-step reasoning tasks
What Deep Mode Adds:
- Planning - LLM generates investigation steps before execution
- Iteration - Execute steps until goal achieved (not just once)
- Re-planning - Adjust plan based on findings mid-execution
- Memory - Persist findings across iterations for context
How it works:
User: /fleet rca "why is API returning 500 errors"
│
▼
┌─────────────────────────────────────────────────────────────┐
│ DEEP MODE = Agentic Loop (like Claude Code) │
│ │
│ 1. PLAN: Generate investigation steps │
│ → Check pods, fetch logs, query metrics, correlate │
│ │
│ 2. EXECUTE: Run each step using appropriate tools │
│ → kubectl get pods → kubectl logs → promql query │
│ │
│ 3. ITERATE: Continue until goal achieved │
│ → Found OOM? Check memory limits. Still unclear? Dig. │
│ │
│ 4. SYNTHESIZE: Produce final answer with evidence │
│ → Root cause + evidence + recommendations │
└─────────────────────────────────────────────────────────────┘
Example Output:
🔍 RCA Fleet - Investigating...
📋 Plan:
1. Check pod status
2. Fetch error logs
3. Query metrics
4. Correlate findings
⏳ Step 1/4: Checking pods...
→ 2/3 pods in CrashLoopBackOff
⏳ Step 2/4: Fetching logs...
→ OutOfMemoryError at 14:32
⏳ Step 3/4: Querying metrics...
→ Memory at 98% before crash
⏳ Step 4/4: Correlating...
✅ Root Cause: Memory leak in API service
Evidence:
• Pods crashing with OOMKilled
• Memory at 98% before crash
• 3x traffic spike at 14:30
Recommendations:
1. Increase memory limits (immediate)
2. Profile for memory leak (long-term)
Deep Configuration Options:
coordination:
mode: deep
deep:
# Safety limit - stop after N iterations
max_iterations: 10
# Enable planning phase (LLM generates steps)
planning: true
# Persist findings in memory across iterations
memory: true
# Optional: Override model for planning
planner_model: anthropic:claude-sonnet-4-20250514
# Optional: Custom planning prompt
planner_prompt: |
You are planning an investigation. Generate steps to find the root cause.
# Optional: Custom synthesis prompt
synthesizer_prompt: |
Synthesize findings into a clear report with evidence.
When to Use Deep vs Other Modes:
| Scenario | Recommended Mode |
|---|---|
| Multiple perspectives on same task | Peer + Consensus |
| Manager delegates to workers | Hierarchical |
| Sequential data transformation | Pipeline |
| High-volume parallel processing | Swarm |
| Multi-model analysis (cheap → expensive) | Tiered |
| Complex investigation needing iteration | Deep |
Consensus Algorithms
When using Peer or Tiered modes, consensus determines how agent results are aggregated.
Supported Algorithms
| Algorithm | Rule | Use Case |
|---|---|---|
| majority | >50% must agree | Code review, incident triage |
| unanimous | 100% must agree | Critical deployments, security |
| weighted | Votes weighted by role | Senior > Junior reviewers |
| first_wins | First response wins | Time-critical scenarios |
| human_review | Always flag for human | High-stakes decisions |
Algorithm Details
Majority
More than 50% of agents must agree on the result. Fast and tolerant of outliers.
Unanimous
100% of agents must agree. Use for critical decisions where false positives are costly.
Weighted
Each agent has a configurable weight. Senior reviewers can count more than juniors.
agents:
- name: senior-reviewer
weight: 2.0 # Counts as 2 votes
- name: junior-reviewer
weight: 1.0 # Counts as 1 vote
consensus:
algorithm: weighted
weights:
senior-reviewer: 2.0
junior-reviewer: 1.0
FirstWins
First agent to respond wins. Use when speed matters more than consensus.
HumanReview
Always flags for human operator decision. Use for high-stakes scenarios.
consensus:
algorithm: human_review
min_confidence: 0.9 # If confidence below this, definitely needs review
Configuration
coordination:
mode: peer
consensus:
algorithm: majority # majority, unanimous, weighted, first_wins, human_review
min_votes: 2 # Minimum responses required
timeout_ms: 60000 # Max wait time (60 seconds)
allow_partial: true # Accept result if some agents fail
min_confidence: 0.7 # Below this, flag for human review
weights: # Per-agent weights (for weighted algorithm)
senior-reviewer: 2.0
junior-reviewer: 1.0
Example: Code Review Consensus
Security Agent: "CRITICAL: SQL injection on line 42"
Performance Agent: "No SQL issues, but N+1 query problem"
Quality Agent: "SQL injection on line 42, missing tests"
Consensus (majority):
- SQL injection CONFIRMED (2/3 agree)
- N+1 query flagged (1/3, noted but not critical)
- Missing tests flagged (1/3, noted)
Task Distribution Strategies
How tasks are assigned to agents (for Hierarchical and Swarm modes):
| Strategy | Description | Best For |
|---|---|---|
| round-robin | Cycle through agents | Even distribution |
| least-loaded | Agent with fewest tasks | Load balancing |
| random | Random selection | Simple scenarios |
| skill-based | Match agent skills to task | Specialized work |
| sticky | Same task type → same agent | Caching benefits |
coordination:
mode: hierarchical
distribution: least-loaded # or round-robin, random, skill-based, sticky
Agent Roles
Agents can have defined roles that affect their behavior in the fleet:
| Role | Description | Used In |
|---|---|---|
| worker | Regular task executor | All modes |
| manager | Coordinator/orchestrator | Hierarchical mode |
| specialist | Domain expert | Any mode |
| validator | Review/validation | Quality gates |
agents:
- name: security-expert
role: specialist
spec:
instructions: You are a security specialist...
- name: team-lead
role: manager
spec:
instructions: You coordinate the team...
Communication Patterns
Agents can communicate through shared memory and messaging:
Shared Memory Types
| Type | Description | Use Case |
|---|---|---|
| in_memory | RAM-based | Single process, testing |
| redis | Distributed cache | Multi-instance, real-time |
| sqlite | Local database | Persistent, single node |
| postgres | Distributed DB | Production, multi-node |
Message Patterns
| Pattern | Description | Use Case |
|---|---|---|
| direct | Point-to-point | Agent-to-agent |
| broadcast | All agents receive | Announcements |
| pub_sub | Topic-based | Event-driven |
| request_reply | Request-response | Queries |
communication:
pattern: broadcast
broadcast:
channel: team-updates
include_sender: false
shared:
memory:
type: redis
url: redis://localhost:6379
Real-World Examples
Example 1: Code Review Fleet (with Aggregation)
Two specialists review code in parallel. Uses aggregation to collect ALL findings (not consensus):
apiVersion: aof.dev/v1
kind: AgentFleet
metadata:
name: code-review-fleet
spec:
agents:
- name: security-reviewer
role: specialist
spec:
model: google:gemini-2.5-flash
instructions: |
Focus on security: SQL injection, XSS, authentication,
secrets in code, dependency vulnerabilities.
Format: ## Security Review with Critical/High/Medium sections.
- name: quality-reviewer
role: specialist
spec:
model: google:gemini-2.5-flash
instructions: |
Focus on quality: SOLID principles, error handling,
code structure, naming conventions.
Format: ## Quality Review with Issues/Suggestions/Score.
coordination:
mode: peer
distribution: round-robin
aggregation: merge # Collect ALL findings from both specialists
Run it:
aofctl run fleet code-review-fleet.yaml \
--input "Review: function login(user, pass) { const query = 'SELECT * FROM users WHERE name=' + user; return db.query(query); }"
Output (both reviews merged):
╔═══════════════════════════════════════════════════════════════════════════╗
║ FLEET RESULTS (2 agents) ║
╚═══════════════════════════════════════════════════════════════════════════╝
┌─ security-reviewer
│ ## Security Review
│ ### Critical Issues
│ * SQL Injection: Direct concatenation of user input...
└
┌─ quality-reviewer
│ ## Quality Review
│ ### Issues Found
│ 1. Unused parameter 'pass'
│ 2. No error handling
│ ### Score: 2/10
└
Example 1b: Code Review Fleet (with Consensus)
Same task but with consensus - picks the most agreed-upon findings:
coordination:
mode: peer
distribution: round-robin
consensus:
algorithm: majority
min_votes: 2
Use consensus when: Multiple agents analyze the same aspect and you want to validate findings (e.g., 3 security reviewers must agree on vulnerabilities).
Example 2: Incident Response Fleet
Hierarchical coordination with a manager orchestrating specialists:
apiVersion: aof.dev/v1
kind: AgentFleet
metadata:
name: incident-response-team
spec:
agents:
- name: incident-commander
role: manager
spec:
model: google:gemini-2.5-flash
instructions: |
You are the Incident Commander. When an incident arrives:
1. Assess severity and impact
2. Delegate investigation to specialists
3. Coordinate response actions
4. Synthesize findings into an incident report
tools:
- shell
- name: log-investigator
role: specialist
spec:
model: google:gemini-2.5-flash
instructions: |
You analyze logs. Search for errors, exceptions, and anomalies.
Report timeline of events and root cause indicators.
tools:
- shell
- read_file
- name: metrics-analyst
role: specialist
spec:
model: google:gemini-2.5-flash
instructions: |
You analyze metrics and dashboards. Look for:
- Resource exhaustion (CPU, memory, disk)
- Traffic anomalies
- Error rate spikes
tools:
- shell
coordination:
mode: hierarchical
distribution: skill-based
Example 3: Data Pipeline Fleet
Sequential processing with each stage building on the previous:
apiVersion: aof.dev/v1
kind: AgentFleet
metadata:
name: data-pipeline
spec:
agents:
- name: collector
spec:
model: google:gemini-2.5-flash
instructions: |
Collect and normalize raw data from the input source.
Output clean, structured JSON.
- name: analyzer
spec:
model: google:gemini-2.5-flash
instructions: |
Analyze the structured data. Identify patterns, anomalies,
and key insights. Output analysis summary.
- name: reporter
spec:
model: google:gemini-2.5-flash
instructions: |
Generate a human-readable report from the analysis.
Include executive summary, key findings, and recommendations.
coordination:
mode: pipeline
Fleet Metrics
Fleets automatically track execution metrics:
metrics:
total_tasks: 150
completed_tasks: 145
failed_tasks: 5
avg_task_duration_ms: 2340
active_agents: 3
total_agents: 3
consensus_rounds: 145 # For peer mode
Access via CLI:
aofctl describe fleet code-review-team
Token Usage Tracking
Fleet execution automatically tracks token usage for each agent and provides aggregated totals. This is useful for cost monitoring and optimization.
Token Usage in Output
After fleet execution, token usage is displayed in the completion summary:
╭─────────────────────────────────────────────────────────────╮
│ 🚀 FLEET EXECUTION COMPLETE │
├─────────────────────────────────────────────────────────────┤
│ Fleet: code-review-fleet │
│ Duration: 5.23s │
├─────────────────────────────────────────────────────────────┤
│ Token Usage: │
│ Input: 1,234 tokens │
│ Output: 567 tokens │
│ Total: 1,801 tokens │
╰─────────────────────────────────────────────────────────────╯
Token Usage in JSON Output
When using --output json, token usage is included in the result:
{
"results": [...],
"agent_count": 2,
"usage": {
"input_tokens": 1234,
"output_tokens": 567,
"total_tokens": 1801
}
}
Per-Agent Token Tracking
Each agent's token usage is tracked individually and included in the result when using aggregation: merge:
{
"results": [
{
"agent": "security-reviewer",
"response": "...",
"input_tokens": 612,
"output_tokens": 284
},
{
"agent": "quality-reviewer",
"response": "...",
"input_tokens": 622,
"output_tokens": 283
}
],
"usage": {
"input_tokens": 1234,
"output_tokens": 567,
"total_tokens": 1801
}
}
Cost Estimation
Use token counts to estimate costs based on your model's pricing:
# Gemini 2.5 Flash pricing (~$0.075/1M input, ~$0.30/1M output)
# 1,234 input + 567 output ≈ $0.00026
# Claude Sonnet pricing (~$3/1M input, ~$15/1M output)
# Same tokens ≈ $0.012
Best Practices
1. Choose the Right Mode
| Situation | Recommended Mode |
|---|---|
| Need multiple perspectives | Peer + Consensus |
| Complex orchestration | Hierarchical |
| Sequential processing | Pipeline |
| High-volume, uniform tasks | Swarm |
2. Optimize Agent Instructions
Each agent should have focused, specific instructions:
# ❌ Bad: Too generic
instructions: Review the code for issues.
# ✅ Good: Focused and specific
instructions: |
You are a SECURITY specialist. Focus ONLY on:
- SQL injection vulnerabilities
- XSS attack vectors
- Authentication/authorization flaws
- Secrets or credentials in code
- Insecure dependencies
Ignore performance and style issues - other agents handle those.
3. Use Appropriate Consensus
| Scenario | Algorithm |
|---|---|
| General review | majority |
| Security-critical | unanimous |
| Mixed seniority | weighted |
| Time-critical | first_wins |
4. Set Reasonable Timeouts
consensus:
timeout_ms: 60000 # 60 seconds for most tasks
allow_partial: true # Don't fail if one agent is slow
5. Use Replicas for Scaling
agents:
- name: worker
replicas: 3 # 3 instances for load balancing
Summary
| Aspect | Single Agent | Fleet |
|---|---|---|
| Perspectives | 1 | Multiple |
| Reliability | Single point of failure | Consensus-validated |
| Cost | Can be expensive | Often cheaper (parallel cheap models) |
| Latency | Sequential | Parallel execution |
| Specialization | Generalist | Deep expertise per agent |
| Audit Trail | Limited | Full consensus history |
Rule of thumb: Use fleets for anything critical, multi-perspective, or high-volume. Use single agents for simple, conversational, or cost-sensitive tasks.
Next Steps: