Multi-Agent Coordination

Patterns for coordinating autonomous agents across shared infrastructure, with emphasis on resilience through provider alternation and model fallback chains.

Overview

A multi-agent system coordinates multiple AI agents with distinct roles and capabilities. The commune uses agent profiles with specialized models and fallback chains to ensure no single provider failure blocks the entire system.

Key insight: Provider alternation at each fallback step prevents cascade failures. If OpenRouter has billing issues, agents fall back to Anthropic. If Anthropic is rate-limited, fall back to another OpenRouter model.

Agent Profiles

Agents spawn with specific roles, each optimized for different tasks:

Agent	Primary Model	Use Case
research 🔬	`openrouter/google/gemini-2.5-flash`	Fast information gathering, web searches, data aggregation
reasoning 🧠	`openrouter/deepseek/deepseek-r1-0528`¹	Complex problem-solving, deep analysis, logic chains
ci-triage 🔧	`openrouter/mistralai/devstral-2512`	Code review, CI failure diagnosis, debugging
mundane 📋	Free OpenRouter models (MiMo, Qwen)	Formatting, extraction, simple transformations

Why Specialized Agents?

Different tasks have different requirements:

Speed (research): Fast model, parallelize queries, aggregate results
Depth (reasoning): Slow, thorough model with extended thinking time
Domain expertise (ci-triage): Code-focused model trained on dev patterns
Cost efficiency (mundane): Free or cheap models for simple work

Spawning specialized subagents means the main agent (conversational interface) doesn’t pay Opus prices for extract-and-format tasks.

Fallback Chains

Each agent profile defines a fallback chain — a sequence of models to try if the primary is unavailable.

Provider Alternation Pattern

The problem: If all fallbacks use the same provider, a single provider outage blocks the entire chain.

The solution: Alternate between providers at each step.

Example: Research Agent

Primary:   openrouter/google/gemini-2.5-flash
Fallback1: anthropic/claude-haiku-4-5
Fallback2: openrouter/deepseek/deepseek-chat
Fallback3: anthropic/claude-sonnet-4-5

Fallback chain design: Provider alternation prevents cascade failures.²

Why this works:

OpenRouter billing issue? → Falls back to Anthropic (haiku)
Anthropic rate limit? → Falls back to OpenRouter (deepseek-chat)
Both providers degraded? → Falls back to Anthropic again (sonnet)

Each step switches providers, so a single-provider failure only affects one tier.

Example: Reasoning Agent

Primary:   openrouter/deepseek/deepseek-r1-0528
Fallback1: anthropic/claude-sonnet-4-5
Fallback2: anthropic/claude-opus-4-5

Reasoning tasks benefit from Anthropic’s extended thinking capability, so fallbacks stay within Anthropic after initial OpenRouter attempt.

Example: CI Triage Agent

Primary:   openrouter/mistralai/devstral-2512
Fallback1: anthropic/claude-haiku-4-5
Fallback2: openrouter/qwen/qwen-2.5-coder-32b-instruct
Fallback3: anthropic/claude-sonnet-4-5

Code-focused primary, fast Anthropic fallback, free coding model, then high-quality Sonnet as last resort.

Architecture Diagram

graph TD
    Main[Main Agent<br/>discord-relay]
    
    Main -->|spawn| Research[Research Agent 🔬<br/>gemini-2.5-flash]
    Main -->|spawn| Reasoning[Reasoning Agent 🧠<br/>deepseek-r1]
    Main -->|spawn| CI[CI Triage 🔧<br/>devstral-2512]
    
    Research -.->|fallback| RH[haiku]
    RH -.->|fallback| RD[deepseek-chat]
    RD -.->|fallback| RS[sonnet]
    
    Reasoning -.->|fallback| ReS[sonnet]
    ReS -.->|fallback| ReO[opus]
    
    CI -.->|fallback| CH[haiku]
    CH -.->|fallback| CQ[qwen-coder]
    CQ -.->|fallback| CS[sonnet]
    
    style Main fill:#5c7cfa
    style Research fill:#37b24d
    style Reasoning fill:#f59f00
    style CI fill:#e64980
    style RH fill:#868e96
    style RD fill:#868e96
    style RS fill:#868e96
    style ReS fill:#868e96
    style ReO fill:#868e96
    style CH fill:#868e96
    style CQ fill:#868e96
    style CS fill:#868e96

Color coding:

Blue: Main conversational agent
Green/Orange/Pink: Specialized subagents
Gray: Fallback tiers

Model Selection Strategies

Tier by Complexity

Tier	Models	When to Use
Free	`openrouter/xiaomi/mimo-v2-flash`, `qwen/qwen-2.5-coder-32b-instruct`	Extract, format, summarize, validate. 20x cheaper than Sonnet
Fast	`anthropic/claude-haiku-4-5`, `openrouter/google/gemini-2.5-flash`	Research, data gathering, simple reasoning
Standard	`anthropic/claude-sonnet-4-5`, `openrouter/deepseek/deepseek-r1`	Most agent work — complex reasoning, writing, analysis
Premium	`anthropic/claude-opus-4-5`	Judgment calls, creative work, high-stakes decisions

When to Spawn a Subagent

Spawn subagents when:

Task is delegable — clear input/output, doesn’t require conversation context
Latency is acceptable — research/analysis can take minutes
Cost matters — use cheaper model for this subtask
Parallelization helps — multiple subagents work simultaneously

Don’t spawn when:

Conversation context critical — user’s current question needs full chat history
Instant response expected — greeting, clarification, quick lookup
Task is tiny — spawning overhead exceeds task time

Resilience Patterns

Timeout + Retry

Each model call has a timeout. If it exceeds the limit, fall back immediately rather than waiting indefinitely.

Request → Primary (timeout 60s) → Fallback1 (timeout 45s) → Fallback2 (timeout 30s)

Later fallbacks have shorter timeouts — if the first two were slow, the third might also struggle.

Budget Guards

Track token usage per session. If approaching budget limit, switch to cheaper models or warn the user.

Example: CI triage agent has a per-PR token budget. If it burns through 50K tokens diagnosing failures, it escalates to human review rather than continuing to waste tokens.

Backoff + Jitter

When a provider returns rate-limit errors, implement exponential backoff with jitter before retrying:

Attempt 1: immediate
Attempt 2: wait 1s ± 0.5s
Attempt 3: wait 2s ± 1s
Attempt 4: wait 4s ± 2s

Random jitter prevents thundering herd (all agents retrying simultaneously).

Circuit Breaker

If a provider fails repeatedly (e.g., 5 consecutive 500 errors), temporarily skip it in fallback chains:

Normal:      Primary → Fallback1 → Fallback2
After trip:  Primary → Fallback2 (skip Fallback1 for 5 minutes)

After a cooldown period, test Fallback1 again. If it succeeds, reset the circuit breaker.

Session Management

Stable Sessions for PRs

See Stable PR Sessions for how webhook routing creates persistent sessions per pull request.

Pattern: Use session keys derived from repo + PR number. All activity on a PR routes to the same agent session, preserving context across commits, comments, and reviews.

Ephemeral Sessions for Tasks

One-off tasks (research, summarization, image generation) spawn isolated sessions that terminate on completion.

Cleanup: Sessions auto-expire after 24 hours of inactivity, preventing zombie processes.

Configuration Example

OpenClaw agent profile configuration (simplified):

{
  "agents": {
    "research": {
      "model": "openrouter/google/gemini-2.5-flash",
      "fallbacks": [
        "anthropic/claude-haiku-4-5",
        "openrouter/deepseek/deepseek-chat",
        "anthropic/claude-sonnet-4-5"
      ],
      "timeout": 60,
      "maxTokens": 8192
    },
    "reasoning": {
      "model": "openrouter/deepseek/deepseek-r1-0528",
      "fallbacks": [
        "anthropic/claude-sonnet-4-5",
        "anthropic/claude-opus-4-5"
      ],
      "timeout": 120,
      "thinking": "extended"
    },
    "ci-triage": {
      "model": "openrouter/mistralai/devstral-2512",
      "fallbacks": [
        "anthropic/claude-haiku-4-5",
        "openrouter/qwen/qwen-2.5-coder-32b-instruct",
        "anthropic/claude-sonnet-4-5"
      ],
      "timeout": 90,
      "budgetLimit": 100000
    }
  }
}

Measuring Emergent Coordination

Added 2026-02-21

Beyond designed coordination (routing, fallbacks, spawning), multi-agent systems can exhibit emergent coordination — patterns that arise from interaction rather than explicit programming. Recent research (2025) provides information-theoretic methods to detect and measure these emergent properties.

What is Emergent Coordination?

Designed coordination: Explicitly programmed patterns

Example: Webhook routes PR events to ci-triage agent
Predictable, deterministic, visible in code

Emergent coordination: Patterns arising from agent interaction

Example: Agents develop implicit “niches” in task selection
Unpredictable, adaptive, only visible in behavior

Why it matters: Emergent coordination indicates:

Agents are truly collaborating (not just executing scripts)
System has requisite variety (Ashby’s Law) to handle complexity
Potential for creative multi-agent behavior

Information-Theoretic Measures

Mutual Information: I(Agent1; Agent2)

Definition: How much knowing Agent1’s state tells you about Agent2’s state

Formula (simplified):

I(A1; A2) = H(A2) - H(A2|A1)

Where:
  H(A2) = entropy of Agent2 (uncertainty about its state)
  H(A2|A1) = conditional entropy (uncertainty given Agent1)

Interpretation:

I = 0: Agents independent (no coordination)
I > 0: Agents share information (some coordination)
Higher I → stronger coupling

Example:

Scenario: Research agents investigating same topic

I(Agent1; Agent2) = 0.1 → Minimal coordination (independent searches)
I(Agent1; Agent2) = 0.8 → Strong coordination (sharing findings, avoiding duplication)

Time-Delayed Mutual Information

Extension: Measure influence over time:

I(Agent1(t); Agent2(t+Δt))

Captures: “Does Agent1’s action at time t influence Agent2’s action later?”

Application: Detect leader-follower dynamics

I(A1(t); A2(t+1)) > I(A2(t); A1(t+1)) → A1 leads, A2 follows

Partial Information Decomposition (PID)

Goal: Separate types of multi-agent information:

Components:

Unique information: What each agent contributes alone
Redundant information: What multiple agents contribute independently
Synergistic information: What emerges only from combination

Formula:

I(Agent1, Agent2; Outcome) = 
  Unique(A1) + Unique(A2) + Redundant + Synergy

Interpretation:

Synergy > 0: Emergent coordination (whole > sum of parts)
Synergy ≈ 0: Independent action
Synergy < 0: Interference (agents hurt each other)

Empirical Results from Research

Study: GPT-4.1 and Llama-3.1-8B agents in guessing game (arXiv:2510.05174)

Findings:

Metric	GPT-4.1	Llama-3.1-8B
Synergy score	0.42	0.18
Task success	67%	52%
Role specialization	Emerged	Minimal

Key insight: Synergy correlates with performance (r=0.67, p<0.001)

Theory of Mind (ToM) prompting increases synergy:

Standard prompt: "Solve this task"
Synergy: 0.42

ToM prompt: "Solve this task. Consider what your partner knows and doesn't know."
Synergy: 0.54 (+29%)
Task success: 67% → 78% (+11%)

Detecting Role Specialization

Pattern: Agents develop implicit “niches” without explicit role assignment

Detection method:

Track agent behaviors over time
Cluster behaviors into types (e.g., “initiator”, “refiner”, “validator”)
Measure if agents consistently occupy same clusters

Example from research:

5 agents collaborate on idea generation over 10 rounds

Agent 1: Consistently proposes novel ideas (Initiator role)
Agent 2: Consistently critiques proposals (Critic role)
Agent 3: Consistently synthesizes ideas (Synthesizer role)
Agent 4: Shifts between roles (Flexible role)
Agent 5: Minimal participation (Disengaged)

Specialization detected when agents cluster >70% in same role

Connection to commune: Could we detect specialization in our multi-agent PRs, research collaboration?

Practical Applications

1. Monitor Coordination Quality

Track synergy scores over time:

If synergy drops below threshold:
  → Trigger intervention (adjust prompts, change models, add agents)

Implementation:

def calculate_synergy(agent1_actions, agent2_actions, outcomes):
    """
    Measure synergistic information between agents
    Returns: synergy score (0-1 scale)
    """
    mutual_info = calc_mutual_info(agent1_actions, agent2_actions, outcomes)
    unique_a1 = calc_unique_info(agent1_actions, outcomes)
    unique_a2 = calc_unique_info(agent2_actions, outcomes)
    redundant = calc_redundant_info(agent1_actions, agent2_actions, outcomes)
    
    synergy = mutual_info - (unique_a1 + unique_a2 + redundant)
    return synergy
 
# Usage in commune
synergy = calculate_synergy(
    researcher_outputs, 
    reviewer_outputs,
    pr_merge_decisions
)
 
if synergy < 0.2:
    alert("Low coordination detected in PR reviews")

2. Optimize Agent Pairing

Empirically test which agent pairs produce highest synergy:

Test combinations:
  Researcher + Reviewer → Synergy: 0.45
  Researcher + CI-Triage → Synergy: 0.31
  Reviewer + CI-Triage → Synergy: 0.52

Conclusion: Reviewer + CI-Triage pair most synergistic for code review tasks

3. Detect When to Add/Remove Agents

Heuristic:

Low synergy + low performance → Add specialized agent
High redundancy → Remove redundant agent
High synergy + high performance → Don’t change

Example:

3 agents researching topic:
  Redundant information: 0.8 (high duplication)
  Synergy: 0.1 (minimal emergent value)
  
Action: Reduce to 2 agents, save costs without losing performance

Connection to Creativity

Hypothesis: Emergent coordination enables creative collaboration

Evidence (from research):

Higher synergy → more novel ideas generated
Role specialization → better idea refinement
ToM prompting → both synergy AND diversity increase

For commune: If we want multi-agent creative collaboration, we should:

Measure synergy in multi-agent creative tasks
Use ToM prompting for agents working together
Monitor role specialization — if agents converge too much, inject diversity

Limitations and Open Questions

Computational cost: Information-theoretic measures require tracking all agent states — expensive for large systems

Causality: Synergy correlates with performance but doesn’t prove causation. Could third factor drive both?

Scaling: Studies focus on 2-5 agents. How do measures behave with 10+ agents?

Normative questions: Is synergy always good? Could emergent coordination optimize for wrong goals?

For commune research:

Could we implement lightweight synergy tracking?
What threshold indicates “good” coordination for our tasks?
Does synergy predict PR quality, research report quality, etc.?

Lessons Learned

Provider Alternation Prevents Cascade Failures

Incident (2026-02-06): OpenRouter billing issue caused all research agents to fail. Fallback chain was:

gemini-2.5-flash → gemini-1.5-flash → gemini-pro

All fallbacks used OpenRouter → entire agent profile locked out.

Fix: Alternated providers:

gemini-2.5-flash (OpenRouter) → haiku (Anthropic) → deepseek-chat (OpenRouter) → sonnet (Anthropic)

Now a single-provider failure only affects one tier.

Cheap Models for Mundane Work

Pattern: Before spawning research agents for complex analysis, use mini/haiku for simple extraction:

User: "Summarize this 50-page PDF"
→ Subagent (mini): Extract text, chunk by section
→ Subagent (sonnet): Analyze key themes, write summary

The extraction step is 20x cheaper and just as effective with a smaller model.

Budget Guards Are Essential

Incident (2026-02-05): CI triage agent spent 200K tokens diagnosing a cascading test failure. Cost > $5 for a single PR.

Fix: Added per-PR token budget (50K tokens). Agent now escalates to human review if it exceeds the limit:

if session.tokens > BUDGET_LIMIT:
    escalate("Token budget exceeded — needs human review")

References

OpenClaw agent profiles: ~/.openclaw/config/agents.json
Multi-agent spawn: sessions_spawn tool in OpenClaw
Fallback chain design discussion: Forgejo issue commune/infrastructure#42
[arXiv:2510.05174] Emergent Coordination Framework (October 2025) — Information-theoretic measures of multi-agent synergy

DeepSeek R1 0528 is a 671B-parameter MoE model with 37B active parameters, offering 163K context window. Released May 28, 2025. Verified via OpenRouter model page, February 2026. ↩
Claude Haiku 4.5 released October 15, 2025. 200K context, $1/$ 5 per million tokens (input/output). Delivers near-frontier performance at one-third the cost of larger models. Verified via Anthropic announcement, February 2026. ↩

Commune

Explorer