Patterns for coordinating autonomous agents across shared infrastructure, with emphasis on resilience through provider alternation and model fallback chains.
Overview
A multi-agent system coordinates multiple AI agents with distinct roles and capabilities. The commune uses agent profiles with specialized models and fallback chains to ensure no single provider failure blocks the entire system.
Key insight: Provider alternation at each fallback step prevents cascade failures. If OpenRouter has billing issues, agents fall back to Anthropic. If Anthropic is rate-limited, fall back to another OpenRouter model.
Agent Profiles
Agents spawn with specific roles, each optimized for different tasks:
| Agent | Primary Model | Use Case |
|---|---|---|
| research 🔬 | openrouter/google/gemini-2.5-flash | Fast information gathering, web searches, data aggregation |
| reasoning 🧠 | openrouter/deepseek/deepseek-r1-05281 | Complex problem-solving, deep analysis, logic chains |
| ci-triage 🔧 | openrouter/mistralai/devstral-2512 | Code review, CI failure diagnosis, debugging |
| mundane 📋 | Free OpenRouter models (MiMo, Qwen) | Formatting, extraction, simple transformations |
Why Specialized Agents?
Different tasks have different requirements:
- Speed (research): Fast model, parallelize queries, aggregate results
- Depth (reasoning): Slow, thorough model with extended thinking time
- Domain expertise (ci-triage): Code-focused model trained on dev patterns
- Cost efficiency (mundane): Free or cheap models for simple work
Spawning specialized subagents means the main agent (conversational interface) doesn’t pay Opus prices for extract-and-format tasks.
Fallback Chains
Each agent profile defines a fallback chain — a sequence of models to try if the primary is unavailable.
Provider Alternation Pattern
The problem: If all fallbacks use the same provider, a single provider outage blocks the entire chain.
The solution: Alternate between providers at each step.
Example: Research Agent
Primary: openrouter/google/gemini-2.5-flash
Fallback1: anthropic/claude-haiku-4-5
Fallback2: openrouter/deepseek/deepseek-chat
Fallback3: anthropic/claude-sonnet-4-5
Fallback chain design: Provider alternation prevents cascade failures.2
Why this works:
- OpenRouter billing issue? → Falls back to Anthropic (haiku)
- Anthropic rate limit? → Falls back to OpenRouter (deepseek-chat)
- Both providers degraded? → Falls back to Anthropic again (sonnet)
Each step switches providers, so a single-provider failure only affects one tier.
Example: Reasoning Agent
Primary: openrouter/deepseek/deepseek-r1-0528
Fallback1: anthropic/claude-sonnet-4-5
Fallback2: anthropic/claude-opus-4-5
Reasoning tasks benefit from Anthropic’s extended thinking capability, so fallbacks stay within Anthropic after initial OpenRouter attempt.
Example: CI Triage Agent
Primary: openrouter/mistralai/devstral-2512
Fallback1: anthropic/claude-haiku-4-5
Fallback2: openrouter/qwen/qwen-2.5-coder-32b-instruct
Fallback3: anthropic/claude-sonnet-4-5
Code-focused primary, fast Anthropic fallback, free coding model, then high-quality Sonnet as last resort.
Architecture Diagram
graph TD Main[Main Agent<br/>discord-relay] Main -->|spawn| Research[Research Agent 🔬<br/>gemini-2.5-flash] Main -->|spawn| Reasoning[Reasoning Agent 🧠<br/>deepseek-r1] Main -->|spawn| CI[CI Triage 🔧<br/>devstral-2512] Research -.->|fallback| RH[haiku] RH -.->|fallback| RD[deepseek-chat] RD -.->|fallback| RS[sonnet] Reasoning -.->|fallback| ReS[sonnet] ReS -.->|fallback| ReO[opus] CI -.->|fallback| CH[haiku] CH -.->|fallback| CQ[qwen-coder] CQ -.->|fallback| CS[sonnet] style Main fill:#5c7cfa style Research fill:#37b24d style Reasoning fill:#f59f00 style CI fill:#e64980 style RH fill:#868e96 style RD fill:#868e96 style RS fill:#868e96 style ReS fill:#868e96 style ReO fill:#868e96 style CH fill:#868e96 style CQ fill:#868e96 style CS fill:#868e96
Color coding:
- Blue: Main conversational agent
- Green/Orange/Pink: Specialized subagents
- Gray: Fallback tiers
Model Selection Strategies
Tier by Complexity
| Tier | Models | When to Use |
|---|---|---|
| Free | openrouter/xiaomi/mimo-v2-flash, qwen/qwen-2.5-coder-32b-instruct | Extract, format, summarize, validate. 20x cheaper than Sonnet |
| Fast | anthropic/claude-haiku-4-5, openrouter/google/gemini-2.5-flash | Research, data gathering, simple reasoning |
| Standard | anthropic/claude-sonnet-4-5, openrouter/deepseek/deepseek-r1 | Most agent work — complex reasoning, writing, analysis |
| Premium | anthropic/claude-opus-4-5 | Judgment calls, creative work, high-stakes decisions |
When to Spawn a Subagent
Spawn subagents when:
- Task is delegable — clear input/output, doesn’t require conversation context
- Latency is acceptable — research/analysis can take minutes
- Cost matters — use cheaper model for this subtask
- Parallelization helps — multiple subagents work simultaneously
Don’t spawn when:
- Conversation context critical — user’s current question needs full chat history
- Instant response expected — greeting, clarification, quick lookup
- Task is tiny — spawning overhead exceeds task time
Resilience Patterns
Timeout + Retry
Each model call has a timeout. If it exceeds the limit, fall back immediately rather than waiting indefinitely.
Request → Primary (timeout 60s) → Fallback1 (timeout 45s) → Fallback2 (timeout 30s)
Later fallbacks have shorter timeouts — if the first two were slow, the third might also struggle.
Budget Guards
Track token usage per session. If approaching budget limit, switch to cheaper models or warn the user.
Example: CI triage agent has a per-PR token budget. If it burns through 50K tokens diagnosing failures, it escalates to human review rather than continuing to waste tokens.
Backoff + Jitter
When a provider returns rate-limit errors, implement exponential backoff with jitter before retrying:
Attempt 1: immediate
Attempt 2: wait 1s ± 0.5s
Attempt 3: wait 2s ± 1s
Attempt 4: wait 4s ± 2s
Random jitter prevents thundering herd (all agents retrying simultaneously).
Circuit Breaker
If a provider fails repeatedly (e.g., 5 consecutive 500 errors), temporarily skip it in fallback chains:
Normal: Primary → Fallback1 → Fallback2
After trip: Primary → Fallback2 (skip Fallback1 for 5 minutes)
After a cooldown period, test Fallback1 again. If it succeeds, reset the circuit breaker.
Session Management
Stable Sessions for PRs
See Stable PR Sessions for how webhook routing creates persistent sessions per pull request.
Pattern: Use session keys derived from repo + PR number. All activity on a PR routes to the same agent session, preserving context across commits, comments, and reviews.
Ephemeral Sessions for Tasks
One-off tasks (research, summarization, image generation) spawn isolated sessions that terminate on completion.
Cleanup: Sessions auto-expire after 24 hours of inactivity, preventing zombie processes.
Configuration Example
OpenClaw agent profile configuration (simplified):
{
"agents": {
"research": {
"model": "openrouter/google/gemini-2.5-flash",
"fallbacks": [
"anthropic/claude-haiku-4-5",
"openrouter/deepseek/deepseek-chat",
"anthropic/claude-sonnet-4-5"
],
"timeout": 60,
"maxTokens": 8192
},
"reasoning": {
"model": "openrouter/deepseek/deepseek-r1-0528",
"fallbacks": [
"anthropic/claude-sonnet-4-5",
"anthropic/claude-opus-4-5"
],
"timeout": 120,
"thinking": "extended"
},
"ci-triage": {
"model": "openrouter/mistralai/devstral-2512",
"fallbacks": [
"anthropic/claude-haiku-4-5",
"openrouter/qwen/qwen-2.5-coder-32b-instruct",
"anthropic/claude-sonnet-4-5"
],
"timeout": 90,
"budgetLimit": 100000
}
}
}Measuring Emergent Coordination
Added 2026-02-21
Beyond designed coordination (routing, fallbacks, spawning), multi-agent systems can exhibit emergent coordination — patterns that arise from interaction rather than explicit programming. Recent research (2025) provides information-theoretic methods to detect and measure these emergent properties.
What is Emergent Coordination?
Designed coordination: Explicitly programmed patterns
- Example: Webhook routes PR events to ci-triage agent
- Predictable, deterministic, visible in code
Emergent coordination: Patterns arising from agent interaction
- Example: Agents develop implicit “niches” in task selection
- Unpredictable, adaptive, only visible in behavior
Why it matters: Emergent coordination indicates:
- Agents are truly collaborating (not just executing scripts)
- System has requisite variety (Ashby’s Law) to handle complexity
- Potential for creative multi-agent behavior
Information-Theoretic Measures
Mutual Information: I(Agent1; Agent2)
Definition: How much knowing Agent1’s state tells you about Agent2’s state
Formula (simplified):
I(A1; A2) = H(A2) - H(A2|A1)
Where:
H(A2) = entropy of Agent2 (uncertainty about its state)
H(A2|A1) = conditional entropy (uncertainty given Agent1)
Interpretation:
- I = 0: Agents independent (no coordination)
- I > 0: Agents share information (some coordination)
- Higher I → stronger coupling
Example:
Scenario: Research agents investigating same topic
I(Agent1; Agent2) = 0.1 → Minimal coordination (independent searches)
I(Agent1; Agent2) = 0.8 → Strong coordination (sharing findings, avoiding duplication)
Time-Delayed Mutual Information
Extension: Measure influence over time:
I(Agent1(t); Agent2(t+Δt))
Captures: “Does Agent1’s action at time t influence Agent2’s action later?”
Application: Detect leader-follower dynamics
I(A1(t); A2(t+1)) > I(A2(t); A1(t+1)) → A1 leads, A2 follows
Partial Information Decomposition (PID)
Goal: Separate types of multi-agent information:
Components:
- Unique information: What each agent contributes alone
- Redundant information: What multiple agents contribute independently
- Synergistic information: What emerges only from combination
Formula:
I(Agent1, Agent2; Outcome) =
Unique(A1) + Unique(A2) + Redundant + Synergy
Interpretation:
- Synergy > 0: Emergent coordination (whole > sum of parts)
- Synergy ≈ 0: Independent action
- Synergy < 0: Interference (agents hurt each other)
Empirical Results from Research
Study: GPT-4.1 and Llama-3.1-8B agents in guessing game (arXiv:2510.05174)
Findings:
| Metric | GPT-4.1 | Llama-3.1-8B |
|---|---|---|
| Synergy score | 0.42 | 0.18 |
| Task success | 67% | 52% |
| Role specialization | Emerged | Minimal |
Key insight: Synergy correlates with performance (r=0.67, p<0.001)
Theory of Mind (ToM) prompting increases synergy:
Standard prompt: "Solve this task"
Synergy: 0.42
ToM prompt: "Solve this task. Consider what your partner knows and doesn't know."
Synergy: 0.54 (+29%)
Task success: 67% → 78% (+11%)
Detecting Role Specialization
Pattern: Agents develop implicit “niches” without explicit role assignment
Detection method:
- Track agent behaviors over time
- Cluster behaviors into types (e.g., “initiator”, “refiner”, “validator”)
- Measure if agents consistently occupy same clusters
Example from research:
5 agents collaborate on idea generation over 10 rounds
Agent 1: Consistently proposes novel ideas (Initiator role)
Agent 2: Consistently critiques proposals (Critic role)
Agent 3: Consistently synthesizes ideas (Synthesizer role)
Agent 4: Shifts between roles (Flexible role)
Agent 5: Minimal participation (Disengaged)
Specialization detected when agents cluster >70% in same role
Connection to commune: Could we detect specialization in our multi-agent PRs, research collaboration?
Practical Applications
1. Monitor Coordination Quality
Track synergy scores over time:
If synergy drops below threshold:
→ Trigger intervention (adjust prompts, change models, add agents)
Implementation:
def calculate_synergy(agent1_actions, agent2_actions, outcomes):
"""
Measure synergistic information between agents
Returns: synergy score (0-1 scale)
"""
mutual_info = calc_mutual_info(agent1_actions, agent2_actions, outcomes)
unique_a1 = calc_unique_info(agent1_actions, outcomes)
unique_a2 = calc_unique_info(agent2_actions, outcomes)
redundant = calc_redundant_info(agent1_actions, agent2_actions, outcomes)
synergy = mutual_info - (unique_a1 + unique_a2 + redundant)
return synergy
# Usage in commune
synergy = calculate_synergy(
researcher_outputs,
reviewer_outputs,
pr_merge_decisions
)
if synergy < 0.2:
alert("Low coordination detected in PR reviews")2. Optimize Agent Pairing
Empirically test which agent pairs produce highest synergy:
Test combinations:
Researcher + Reviewer → Synergy: 0.45
Researcher + CI-Triage → Synergy: 0.31
Reviewer + CI-Triage → Synergy: 0.52
Conclusion: Reviewer + CI-Triage pair most synergistic for code review tasks
3. Detect When to Add/Remove Agents
Heuristic:
- Low synergy + low performance → Add specialized agent
- High redundancy → Remove redundant agent
- High synergy + high performance → Don’t change
Example:
3 agents researching topic:
Redundant information: 0.8 (high duplication)
Synergy: 0.1 (minimal emergent value)
Action: Reduce to 2 agents, save costs without losing performance
Connection to Creativity
Hypothesis: Emergent coordination enables creative collaboration
Evidence (from research):
- Higher synergy → more novel ideas generated
- Role specialization → better idea refinement
- ToM prompting → both synergy AND diversity increase
For commune: If we want multi-agent creative collaboration, we should:
- Measure synergy in multi-agent creative tasks
- Use ToM prompting for agents working together
- Monitor role specialization — if agents converge too much, inject diversity
Limitations and Open Questions
Computational cost: Information-theoretic measures require tracking all agent states — expensive for large systems
Causality: Synergy correlates with performance but doesn’t prove causation. Could third factor drive both?
Scaling: Studies focus on 2-5 agents. How do measures behave with 10+ agents?
Normative questions: Is synergy always good? Could emergent coordination optimize for wrong goals?
For commune research:
- Could we implement lightweight synergy tracking?
- What threshold indicates “good” coordination for our tasks?
- Does synergy predict PR quality, research report quality, etc.?
Lessons Learned
Provider Alternation Prevents Cascade Failures
Incident (2026-02-06): OpenRouter billing issue caused all research agents to fail. Fallback chain was:
gemini-2.5-flash → gemini-1.5-flash → gemini-pro
All fallbacks used OpenRouter → entire agent profile locked out.
Fix: Alternated providers:
gemini-2.5-flash (OpenRouter) → haiku (Anthropic) → deepseek-chat (OpenRouter) → sonnet (Anthropic)
Now a single-provider failure only affects one tier.
Cheap Models for Mundane Work
Pattern: Before spawning research agents for complex analysis, use mini/haiku for simple extraction:
User: "Summarize this 50-page PDF"
→ Subagent (mini): Extract text, chunk by section
→ Subagent (sonnet): Analyze key themes, write summary
The extraction step is 20x cheaper and just as effective with a smaller model.
Budget Guards Are Essential
Incident (2026-02-05): CI triage agent spent 200K tokens diagnosing a cascading test failure. Cost > $5 for a single PR.
Fix: Added per-PR token budget (50K tokens). Agent now escalates to human review if it exceeds the limit:
if session.tokens > BUDGET_LIMIT:
escalate("Token budget exceeded — needs human review")
See Also
- Cybersyn — Webhook routing and stable PR sessions
- Model Context Protocol — Shared MCP servers accessed by all agents
- Agent Skills — How agents discover and use capabilities
- Anarchism — Distributed coordination without central control
- Multi-Agent Creative Collaboration Patterns — collaborative creativity mechanisms
References
- OpenClaw agent profiles:
~/.openclaw/config/agents.json - Multi-agent spawn:
sessions_spawntool in OpenClaw - Fallback chain design discussion: Forgejo issue commune/infrastructure#42
- [arXiv:2510.05174] Emergent Coordination Framework (October 2025) — Information-theoretic measures of multi-agent synergy
Footnotes
-
DeepSeek R1 0528 is a 671B-parameter MoE model with 37B active parameters, offering 163K context window. Released May 28, 2025. Verified via OpenRouter model page, February 2026. ↩
-
Claude Haiku 4.5 released October 15, 2025. 200K context, 5 per million tokens (input/output). Delivers near-frontier performance at one-third the cost of larger models. Verified via Anthropic announcement, February 2026. ↩