Like AOF? Give us a star!

If you find AOF useful, please star us on GitHub. It helps us reach more developers and grow the community.

AgentFlow YAML Specification

Complete reference for AgentFlow workflow specifications.

Overview

An AgentFlow is a multi-step workflow that orchestrates agents, tools, and integrations in a directed graph. Flows define the sequence of operations with nodes and connections.

Key Characteristics:

Flows are pure workflows - no embedded triggers
Triggers reference flows - via command bindings in Trigger CRDs
Entry point is "start" - connections begin from the "start" node
Nodes execute sequentially or conditionally - based on connections

Basic Structure

apiVersion: aof.dev/v1
kind: AgentFlow
metadata:
  name: string              # Required: Unique identifier
  namespace: string         # Optional: Namespace for isolation
  labels:                   # Optional: Key-value labels
    key: value
  annotations:              # Optional: Additional metadata
    key: value

spec:
  description: string       # Optional: Human-readable description

  context:                  # Optional: Execution context
    kubeconfig: string      # Path to kubeconfig
    namespace: string       # Default K8s namespace
    cluster: string         # Cluster name
    env:                    # Environment variables
      KEY: value
    working_dir: string

  nodes:                    # Required: Flow steps
    - id: string
      type: NodeType
      config:
        agent: string         # For Agent nodes: agent name
        fleet: string         # For Fleet nodes: fleet name
        input: string         # Input to pass
        tools: []             # Optional: Override agent tools
        mcp_servers: []       # Optional: MCP server configs
      conditions: []

  connections:              # Required: Node edges
    - from: string          # Source node (or "start")
      to: string            # Target node
      condition: string     # Optional: Condition expression

  config:                   # Optional: Global flow config
    default_timeout_seconds: int
    verbose: bool
    retry:
      max_attempts: int
      initial_delay: string
      backoff_multiplier: float

Metadata

`metadata.name`

Type: string Required: Yes

Unique identifier for the flow. Used in CLI commands and referenced by Trigger command bindings.

`metadata.labels`

Type: map[string]string Required: No

Key-value pairs for categorization and filtering.

Example:

metadata:
  name: deploy-flow
  labels:
    purpose: deployment
    team: platform
    environment: production

Spec Fields

`spec.description`

Type: string Required: No

Human-readable description of what this flow does.

spec:
  description: "Production deployment workflow with approval gates"

`spec.context`

Type: object Required: No

Execution context for the flow including environment variables and Kubernetes configuration.

spec:
  context:
    kubeconfig: ~/.kube/config
    namespace: production
    cluster: prod-cluster
    env:
      ENVIRONMENT: production
      LOG_LEVEL: info
    working_dir: /app

Node Types

Nodes define what happens at each step of the flow.

Agent Node

Execute a single agent. Agent nodes support two configuration methods:

Option 1: Reference External Agent

Reference an agent defined in a separate YAML file:

nodes:
  - id: analyze
    type: Agent
    config:
      agent: k8s-agent           # Agent name or path
      input: "${event.text}"     # Input from trigger event
      timeout_seconds: 120

Option 2: Inline Agent Definition

Define the agent directly in the flow (recommended for simple, single-use agents):

nodes:
  - id: check-status
    type: Agent
    config:
      inline:
        name: status-checker
        model: google:gemini-2.5-flash
        instructions: |
          Check the system status and report any issues.
          Format your response as a summary.
        tools:
          - shell
        temperature: 0.1
      input: "${event.text}"

Inline Agent Fields:

Field	Type	Required	Description
`name`	string	Yes	Agent identifier
`model`	string	Yes	Model to use (e.g., `google:gemini-2.5-flash`)
`instructions`	string	No	System prompt / instructions
`tools`	list	No	Tools available to the agent
`mcp_servers`	list	No	MCP server configurations
`temperature`	float	No	Model temperature (default: 0.7)
`max_tokens`	int	No	Maximum response tokens

When to use each approach:

Use Case	Recommended
Reusable agent across multiple flows	External agent file
Complex agent with many tools	External agent file
Simple, single-purpose step	Inline definition
Self-contained flow (no external deps)	Inline definition

Fleet Node

Execute a fleet of agents (multi-agent coordination).

nodes:
  - id: diagnose
    type: Fleet
    config:
      fleet: rca-fleet           # Fleet name or path
      input: "${event.text}"

Script Node

Execute shell commands or native tools without LLM involvement. Script nodes are ideal for deterministic operations that don't require intelligence, such as:

Running shell commands (like Jenkins pipelines)
Executing docker, kubectl, or other CLI tools
Parsing logs or data files
Making HTTP requests
File operations

Script nodes are faster and more predictable than Agent nodes since they don't use LLM tokens.

Option 1: Shell Command Execution

nodes:
  - id: check-containers
    type: Script
    config:
      scriptConfig:
        command: docker ps -a --format "{{json .}}"
        parse: json              # Parse output as JSON
        timeout_seconds: 30
        fail_on_error: true      # Fail flow if command fails

Option 2: Native Tool Execution

Use built-in native tools for common operations:

nodes:
  - id: get-pods
    type: Script
    config:
      scriptConfig:
        tool: kubectl            # Built-in tool
        action: get              # Tool action
        args:
          resource: pods
          namespace: production

Available Native Tools:

Tool	Actions	Description
`docker`	`ps`, `logs`, `inspect`, `stats`	Container operations
`kubectl`	`get`, `logs`, `describe`	Kubernetes operations
`http`	`get`, `post`, `put`, `delete`	HTTP requests
`json`	`parse`, `extract`, `merge`	JSON transformation
`file`	`read`, `write`, `exists`, `list`	File operations

Script Configuration Fields:

Field	Type	Required	Default	Description
`command`	string	*	-	Shell command to execute
`tool`	string	*	-	Native tool to use
`action`	string	No	"run"	Tool action
`args`	object	No		Arguments for tool
`working_dir`	string	No	-	Working directory
`env`	object	No		Environment variables
`timeout_seconds`	int	No	60	Execution timeout
`parse`	string	No	"text"	Output parsing mode
`pattern`	string	No	-	Regex pattern (for parse: regex)
`fail_on_error`	bool	No	true	Fail on non-zero exit

* Either command or tool is required.

Output Parsing Modes:

Mode	Description
`text`	Raw text output (default)
`json`	Parse as JSON object
`lines`	Split into array of lines
`regex`	Apply regex pattern with named groups

Example: Docker Diagnostics with Script Nodes

apiVersion: aof.dev/v1
kind: AgentFlow
metadata:
  name: docker-diagnostics
spec:
  description: "Docker health check using Script nodes"

  nodes:
    # Native tool: Get container status
    - id: check-status
      type: Script
      config:
        scriptConfig:
          tool: docker
          action: ps
          args:
            all: true

    # Shell command: Get logs
    - id: get-logs
      type: Script
      config:
        scriptConfig:
          command: |
            docker ps -a --filter "status=exited" --format "{{.Names}}" | \
            while read name; do
              echo "=== $name ==="
              docker logs --tail 20 "$name" 2>&1
            done
          parse: text

    # LLM analysis of Script outputs
    - id: analyze
      type: Agent
      config:
        inline:
          name: analyzer
          model: google:gemini-2.5-flash
          instructions: "Analyze container diagnostics and provide recommendations."
        input: |
          Status: ${check-status.output}
          Logs: ${get-logs.output}

  connections:
    - from: start
      to: check-status
    - from: check-status
      to: get-logs
    - from: get-logs
      to: analyze

When to use Script vs Agent:

Use Case	Recommended
Running CLI commands	Script
Fetching data from APIs	Script
Parsing logs or files	Script
Intelligent analysis	Agent
Decision making	Agent
Natural language generation	Agent

HumanApproval Node

Wait for human approval before proceeding.

nodes:
  - id: approval
    type: HumanApproval
    config:
      message: "Approve deployment to production?"
      timeout_seconds: 300       # 5 minute timeout
      fallback: reject           # Action if timeout: reject | approve

Conditional Node

Branch based on conditions.

nodes:
  - id: check-env
    type: Conditional
    config:
      expression: "${context.env} == 'production'"
      true_target: production-deploy
      false_target: staging-deploy

End Node

Terminal node that produces final output.

nodes:
  - id: complete
    type: End
    config:
      message: "${previous.output}"

Response Node

Send response to the triggering platform (Slack, Telegram, etc.).

nodes:
  - id: notify
    type: Response
    config:
      message: "${analyze.output}"
      format: markdown           # text | markdown | blocks

Connections

Connections define the flow between nodes. Every flow starts from the implicit "start" node.

Basic Connection

connections:
  - from: start
    to: validate
  - from: validate
    to: deploy
  - from: deploy
    to: notify

Conditional Connection

connections:
  - from: approval
    to: deploy
    condition: approved          # Only if approved
  - from: approval
    to: notify-rejected
    condition: rejected          # Only if rejected

Multiple Outputs

A node can connect to multiple targets based on conditions:

connections:
  - from: analyze
    to: critical-path
    condition: severity == "critical"
  - from: analyze
    to: normal-path
    condition: severity != "critical"

Complete Examples

Deployment Flow with Approval

apiVersion: aof.dev/v1
kind: AgentFlow
metadata:
  name: deploy-flow
  labels:
    purpose: deployment
spec:
  description: "Production deployment with approval gate"

  nodes:
    - id: validate
      type: Agent
      config:
        agent: validator-agent
        input: "Validate deployment: ${event.text}"

    - id: approval
      type: HumanApproval
      config:
        message: |
          Deployment validated. Approve deployment?
          Version: ${validate.output.version}
          Target: ${validate.output.target}
        timeout_seconds: 600

    - id: deploy
      type: Agent
      config:
        agent: k8s-agent
        input: "Deploy ${validate.output.manifest}"

    - id: notify-success
      type: Response
      config:
        message: "✅ Deployment complete: ${deploy.output}"

    - id: notify-rejected
      type: Response
      config:
        message: "❌ Deployment rejected by user"

  connections:
    - from: start
      to: validate
    - from: validate
      to: approval
    - from: approval
      to: deploy
      condition: approved
    - from: approval
      to: notify-rejected
      condition: rejected
    - from: deploy
      to: notify-success

Root Cause Analysis Flow

apiVersion: aof.dev/v1
kind: AgentFlow
metadata:
  name: rca-flow
spec:
  description: "Multi-model root cause analysis with consensus"

  nodes:
    - id: collect
      type: Fleet
      config:
        fleet: rca-fleet
        input: "${event.text}"

    - id: respond
      type: Response
      config:
        message: "${collect.output}"
        format: markdown

  connections:
    - from: start
      to: collect
    - from: collect
      to: respond

Incident Response Flow

apiVersion: aof.dev/v1
kind: AgentFlow
metadata:
  name: incident-flow
spec:
  description: "Automated incident response workflow"

  nodes:
    - id: assess
      type: Agent
      config:
        agent: monitoring-agent
        input: "Assess incident: ${event.text}"

    - id: route
      type: Conditional
      config:
        expression: "${assess.output.severity}"
        cases:
          critical: escalate
          high: page-oncall
          default: log-ticket

    - id: escalate
      type: Agent
      config:
        agent: pagerduty-agent
        input: "Create P1 incident: ${assess.output}"

    - id: page-oncall
      type: Agent
      config:
        agent: slack-agent
        input: "Page on-call: ${assess.output}"

    - id: log-ticket
      type: Agent
      config:
        agent: jira-agent
        input: "Create ticket: ${assess.output}"

  connections:
    - from: start
      to: assess
    - from: assess
      to: route
    - from: route
      to: escalate
      condition: critical
    - from: route
      to: page-oncall
      condition: high
    - from: route
      to: log-ticket
      condition: default

Variable Substitution

Flows support variable substitution using ${...} syntax.

Event Variables

From the triggering event (via Trigger):

Variable	Description
`${event.text}`	Message text
`${event.user.id}`	User ID
`${event.user.name}`	User display name
`${event.channel}`	Channel/chat ID
`${event.platform}`	Platform name (slack, telegram)
`${event.timestamp}`	Event timestamp

Node Output Variables

Access output from previous nodes:

Variable	Description
`${node_id.output}`	Full output from node
`${node_id.output.field}`	Specific field from output
`${previous.output}`	Output from previous node

Context Variables

From flow context:

Variable	Description
`${context.namespace}`	Kubernetes namespace
`${context.cluster}`	Cluster name
`${context.env.KEY}`	Environment variable

Retry Configuration

Configure retry behavior for failed nodes:

spec:
  config:
    retry:
      max_attempts: 3
      initial_delay: 1s
      backoff_multiplier: 2.0
      max_delay: 30s

Per-node retry:

nodes:
  - id: deploy
    type: Agent
    config:
      agent: k8s-agent
      retry:
        max_attempts: 5
        initial_delay: 2s

Timeout Configuration

Global timeout:

spec:
  config:
    default_timeout_seconds: 300

Per-node timeout:

nodes:
  - id: long-operation
    type: Agent
    config:
      agent: batch-agent
      timeout_seconds: 600

Usage with Triggers

Flows are invoked via Trigger command bindings:

# In a Trigger CRD
apiVersion: aof.dev/v1
kind: Trigger
metadata:
  name: slack-production
spec:
  type: Slack
  config:
    bot_token: ${SLACK_BOT_TOKEN}
    signing_secret: ${SLACK_SIGNING_SECRET}

  commands:
    /deploy:
      flow: deploy-flow          # References this AgentFlow
      description: "Deploy to production"
    /rca:
      flow: rca-flow
      description: "Root cause analysis"

  default_agent: devops

CLI Usage

Run a flow directly:

# Execute a flow with input
aofctl run flow examples/flows/deploy-flow.yaml -i "deploy v2.1.0 to production"

# Describe a flow
aofctl describe flow examples/flows/deploy-flow.yaml

# Validate flow YAML
aofctl validate flows/deploy-flow.yaml

Best Practices

Keep flows focused - One flow per logical workflow
Use descriptive node IDs - validate, deploy, not step1, step2
Add descriptions - Document what the flow does
Handle failure paths - Use conditional connections for error cases
Set appropriate timeouts - Prevent runaway executions
Use fleets for complex analysis - Multi-agent coordination for RCA
Use Script nodes for deterministic operations - Save LLM tokens by using Script nodes for data collection, API calls, and file operations. Reserve Agent nodes for tasks requiring intelligence.
Combine Script + Agent nodes - Use Script nodes to gather data, then pass results to Agent nodes for analysis

Overview​

Basic Structure​

Metadata​

metadata.name​

metadata.labels​

Spec Fields​

spec.description​

spec.context​

Node Types​

Agent Node​

Option 1: Reference External Agent​

Option 2: Inline Agent Definition​

Fleet Node​

Script Node​

Option 1: Shell Command Execution​

Option 2: Native Tool Execution​

HumanApproval Node​

Conditional Node​

End Node​

Response Node​

Connections​

Basic Connection​

Conditional Connection​

Multiple Outputs​

Complete Examples​

Deployment Flow with Approval​

Root Cause Analysis Flow​

Incident Response Flow​

Variable Substitution​

Event Variables​

Node Output Variables​

Context Variables​

Retry Configuration​

Timeout Configuration​

Usage with Triggers​

CLI Usage​

Best Practices​