Patterns for coordinating multiple AI agents across shared infrastructure, with emphasis on resilience through provider alternation and model fallback chains, and on detecting whether coordination is actually happening.
The Two Coordination Problems
A multi-agent system has two distinct coordination problems:
- Designed coordination — explicit programming: how requests route, which agent handles what, what the fallback chain is when a provider fails. Predictable; visible in code.
- Emergent coordination — the patterns that arise from agents interacting: implicit role niches, leader-follower dynamics, synergistic combinations. Visible only in behavior.
Robust systems get the first one right and at least measure the second.
Designed Coordination: Agent Profiles
We give each agent a profile that names a primary model, a fallback chain, and a use case. The same conversational interface can spawn a different profile for each kind of subtask.
| Role | Primary Model | Use Case |
|---|---|---|
| research | Fast model (e.g. Gemini Flash, Haiku) | Information gathering, web searches, aggregation |
| reasoning | Deep model (e.g. DeepSeek R1, Sonnet w/ extended thinking) | Complex analysis, logic chains |
| code-triage | Code-focused model (e.g. Devstral, Qwen Coder) | Code review, CI failure diagnosis |
| mundane | Cheap or free model (e.g. MiMo, Qwen) | Formatting, extraction, simple transforms |
Why specialize?
Different tasks have different requirements:
- Speed (research): fast model, parallelize, aggregate.
- Depth (reasoning): slow, thorough model with extended thinking.
- Domain expertise (code-triage): code-focused model trained on dev patterns.
- Cost efficiency (mundane): free or cheap models for simple work.
Spawning specialized subagents means the main agent doesn’t pay premium-model prices for extract-and-format tasks.
When to spawn a subagent
Spawn when:
- The task is delegable — clear input/output, doesn’t require conversation context.
- Latency is acceptable — research/analysis can take minutes.
- Cost matters — a cheaper model would do.
- Parallelization helps — multiple subagents can work simultaneously.
Don’t spawn when:
- Conversation context is critical — current question needs full chat history.
- Instant response is expected — greeting, clarification, quick lookup.
- The task is tiny — spawning overhead exceeds task time.
Fallback Chains and Provider Alternation
Each profile defines a fallback chain — models to try if the primary is unavailable.
The non-obvious requirement: alternate providers between steps. If every fallback uses the same provider, a single provider outage blocks the entire chain.
Example: research profile
Primary: openrouter/google/gemini-2.5-flash
Fallback1: anthropic/claude-haiku-4-5
Fallback2: openrouter/deepseek/deepseek-chat
Fallback3: anthropic/claude-sonnet-4-5
Why this works:
- OpenRouter billing issue → falls back to Anthropic (Haiku).
- Anthropic rate limit → falls back to OpenRouter (DeepSeek Chat).
- Both providers degraded → falls back to Anthropic again (Sonnet).
A single-provider failure only takes out one tier.
Example: reasoning profile
Primary: openrouter/deepseek/deepseek-r1-0528
Fallback1: anthropic/claude-sonnet-4-5
Fallback2: anthropic/claude-opus-4-5
Reasoning tasks benefit from Anthropic’s extended thinking, so once the OpenRouter primary fails the chain stays within Anthropic.
Architecture diagram
graph TD Main[Main Agent] Main -->|spawn| Research[Research<br/>fast model] Main -->|spawn| Reasoning[Reasoning<br/>deep model] Main -->|spawn| Code[Code Triage<br/>code model] Research -.->|fallback| RH[provider B] RH -.->|fallback| RD[provider A] RD -.->|fallback| RS[provider B] Reasoning -.->|fallback| ReS[provider B] ReS -.->|fallback| ReO[provider B premium] Code -.->|fallback| CH[provider B] CH -.->|fallback| CQ[provider A] CQ -.->|fallback| CS[provider B]
Model Tiering by Complexity
| Tier | Examples | Use For |
|---|---|---|
| Free | MiMo, Qwen Coder | Extract, format, summarize, validate |
| Fast | Haiku, Gemini Flash | Research, gathering, simple reasoning |
| Standard | Sonnet, DeepSeek R1 | Most agent work — analysis, writing |
| Premium | Opus | Judgment calls, creative work, high-stakes decisions |
The pattern: pre-process with a cheap model, hand the structured result to a stronger model. A 50-page PDF gets extracted and chunked by a free tier model, then analyzed by a standard model. The extraction step is ~20× cheaper and equally effective.
Resilience Patterns
Timeout + retry
Each model call has a timeout. If exceeded, fall back immediately rather than waiting indefinitely. Later fallbacks get shorter timeouts — if the first two were slow, the third probably will be too.
Request → Primary (60s) → Fallback1 (45s) → Fallback2 (30s)
Budget guards
Track tokens per session. If approaching a limit, switch to cheaper models or escalate. A code-triage agent diagnosing a cascading test failure can easily burn 200K tokens on a single PR; a per-task budget that escalates to human review caps the worst case.
Backoff with jitter
When a provider rate-limits, exponential backoff with random jitter prevents thundering herd:
Attempt 1: immediate
Attempt 2: 1s ± 0.5s
Attempt 3: 2s ± 1s
Attempt 4: 4s ± 2s
Circuit breaker
If a provider returns repeated 5xx errors, temporarily skip it in fallback chains. After a cooldown, test again; reset on success. Without this, every request pays the timeout cost on a known-bad provider.
Emergent Coordination
Beyond designed routing, multi-agent systems can exhibit coordination patterns that arise from interaction rather than explicit programming. The 2025 Emergent Coordination Framework (arXiv:2510.05174) provides information-theoretic ways to detect them.
Mutual information: I(A1; A2)
How much knowing one agent’s state tells you about another’s.
I(A1; A2) = H(A2) − H(A2 | A1)
- I = 0 → agents are independent.
- I > 0 → they share information; some coordination.
- Higher I → stronger coupling.
Time-delayed mutual information
Captures influence over time:
I(A1(t); A2(t + Δt))
If I(A1(t); A2(t+1)) > I(A2(t); A1(t+1)), A1 is leading and A2 is following.
Partial Information Decomposition (PID)
Decomposes joint information into:
- Unique — what each agent contributes alone.
- Redundant — what multiple agents contribute independently.
- Synergistic — what emerges only from combination.
I(A1, A2; Outcome) = Unique(A1) + Unique(A2) + Redundant + Synergy
- Synergy > 0 → emergent coordination (whole > sum of parts).
- Synergy ≈ 0 → independent action.
- Synergy < 0 → interference; agents are hurting each other.
Empirical findings
In a guessing-game study, GPT-4.1 pairs scored synergy 0.42 with 67% task success; Llama-3.1-8B pairs scored 0.18 with 52%. Synergy correlated with performance at r = 0.67 (p < 0.001).
Theory-of-mind prompting (“consider what your partner knows and doesn’t know”) raised synergy from 0.42 to 0.54 (+29%) and task success from 67% to 78% (+11%) on the same models.
Detecting role specialization
Cluster each agent’s behaviors over time. Agents that consistently occupy the same cluster (>70%) have specialized into a role — initiator, critic, synthesizer, validator — even when no role was assigned.
Practical use
- Monitor synergy. Drops below threshold → trigger intervention (adjust prompts, change models, add or remove agents).
- Optimize pairing. Test which agent combinations produce highest synergy on which task types.
- Decide when to add or remove agents. Low synergy + low performance → add a specialist. High redundancy + low synergy → remove a redundant agent.
def synergy(actions_a, actions_b, outcomes):
mi = mutual_info(actions_a, actions_b, outcomes)
ua = unique_info(actions_a, outcomes)
ub = unique_info(actions_b, outcomes)
red = redundant_info(actions_a, actions_b, outcomes)
return mi - (ua + ub + red)Limits
- Cost. Information-theoretic measures require tracking all agent states; expensive at scale.
- Causality. Synergy correlates with performance but doesn’t prove causation.
- Scale. Most studies use 2–5 agents. Behavior at 10+ is open.
- Norms. Synergy isn’t automatically good — agents could synergize toward the wrong goal.
See Also
- Model Context Protocol — shared tool integration
- Multi-Agent Creative Collaboration — collaboration mechanisms in creative tasks
- Law of Requisite Variety — why systems need internal variety to handle environmental complexity