Multi-Agent Coordination

Patterns for coordinating multiple AI agents across shared infrastructure, with emphasis on resilience through provider alternation and model fallback chains, and on detecting whether coordination is actually happening.

The Two Coordination Problems

A multi-agent system has two distinct coordination problems:

Designed coordination — explicit programming: how requests route, which agent handles what, what the fallback chain is when a provider fails. Predictable; visible in code.
Emergent coordination — the patterns that arise from agents interacting: implicit role niches, leader-follower dynamics, synergistic combinations. Visible only in behavior.

Robust systems get the first one right and at least measure the second.

Designed Coordination: Agent Profiles

We give each agent a profile that names a primary model, a fallback chain, and a use case. The same conversational interface can spawn a different profile for each kind of subtask.

Role	Primary Model	Use Case
research	Fast model (e.g. Gemini Flash, Haiku)	Information gathering, web searches, aggregation
reasoning	Deep model (e.g. DeepSeek R1, Sonnet w/ extended thinking)	Complex analysis, logic chains
code-triage	Code-focused model (e.g. Devstral, Qwen Coder)	Code review, CI failure diagnosis
mundane	Cheap or free model (e.g. MiMo, Qwen)	Formatting, extraction, simple transforms

Why specialize?

Different tasks have different requirements:

Speed (research): fast model, parallelize, aggregate.
Depth (reasoning): slow, thorough model with extended thinking.
Domain expertise (code-triage): code-focused model trained on dev patterns.
Cost efficiency (mundane): free or cheap models for simple work.

Spawning specialized subagents means the main agent doesn’t pay premium-model prices for extract-and-format tasks.

When to spawn a subagent

Spawn when:

The task is delegable — clear input/output, doesn’t require conversation context.
Latency is acceptable — research/analysis can take minutes.
Cost matters — a cheaper model would do.
Parallelization helps — multiple subagents can work simultaneously.

Don’t spawn when:

Conversation context is critical — current question needs full chat history.
Instant response is expected — greeting, clarification, quick lookup.
The task is tiny — spawning overhead exceeds task time.

Fallback Chains and Provider Alternation

Each profile defines a fallback chain — models to try if the primary is unavailable.

The non-obvious requirement: alternate providers between steps. If every fallback uses the same provider, a single provider outage blocks the entire chain.

Example: research profile

Primary:   openrouter/google/gemini-2.5-flash
Fallback1: anthropic/claude-haiku-4-5
Fallback2: openrouter/deepseek/deepseek-chat
Fallback3: anthropic/claude-sonnet-4-5

Why this works:

OpenRouter billing issue → falls back to Anthropic (Haiku).
Anthropic rate limit → falls back to OpenRouter (DeepSeek Chat).
Both providers degraded → falls back to Anthropic again (Sonnet).

A single-provider failure only takes out one tier.

Example: reasoning profile

Primary:   openrouter/deepseek/deepseek-r1-0528
Fallback1: anthropic/claude-sonnet-4-5
Fallback2: anthropic/claude-opus-4-5

Reasoning tasks benefit from Anthropic’s extended thinking, so once the OpenRouter primary fails the chain stays within Anthropic.

Architecture diagram

graph TD
    Main[Main Agent]

    Main -->|spawn| Research[Research<br/>fast model]
    Main -->|spawn| Reasoning[Reasoning<br/>deep model]
    Main -->|spawn| Code[Code Triage<br/>code model]

    Research -.->|fallback| RH[provider B]
    RH -.->|fallback| RD[provider A]
    RD -.->|fallback| RS[provider B]

    Reasoning -.->|fallback| ReS[provider B]
    ReS -.->|fallback| ReO[provider B premium]

    Code -.->|fallback| CH[provider B]
    CH -.->|fallback| CQ[provider A]
    CQ -.->|fallback| CS[provider B]

Model Tiering by Complexity

Tier	Examples	Use For
Free	MiMo, Qwen Coder	Extract, format, summarize, validate
Fast	Haiku, Gemini Flash	Research, gathering, simple reasoning
Standard	Sonnet, DeepSeek R1	Most agent work — analysis, writing
Premium	Opus	Judgment calls, creative work, high-stakes decisions

The pattern: pre-process with a cheap model, hand the structured result to a stronger model. A 50-page PDF gets extracted and chunked by a free tier model, then analyzed by a standard model. The extraction step is ~20× cheaper and equally effective.

Resilience Patterns

Timeout + retry

Each model call has a timeout. If exceeded, fall back immediately rather than waiting indefinitely. Later fallbacks get shorter timeouts — if the first two were slow, the third probably will be too.

Request → Primary (60s) → Fallback1 (45s) → Fallback2 (30s)

Budget guards

Track tokens per session. If approaching a limit, switch to cheaper models or escalate. A code-triage agent diagnosing a cascading test failure can easily burn 200K tokens on a single PR; a per-task budget that escalates to human review caps the worst case.

Backoff with jitter

When a provider rate-limits, exponential backoff with random jitter prevents thundering herd:

Attempt 1: immediate
Attempt 2: 1s ± 0.5s
Attempt 3: 2s ± 1s
Attempt 4: 4s ± 2s

Circuit breaker

If a provider returns repeated 5xx errors, temporarily skip it in fallback chains. After a cooldown, test again; reset on success. Without this, every request pays the timeout cost on a known-bad provider.

Emergent Coordination

Beyond designed routing, multi-agent systems can exhibit coordination patterns that arise from interaction rather than explicit programming. The 2025 Emergent Coordination Framework (arXiv:2510.05174) provides information-theoretic ways to detect them.

Mutual information: I(A1; A2)

How much knowing one agent’s state tells you about another’s.

I(A1; A2) = H(A2) − H(A2 | A1)

I = 0 → agents are independent.
I > 0 → they share information; some coordination.
Higher I → stronger coupling.

Time-delayed mutual information

Captures influence over time:

I(A1(t); A2(t + Δt))

If I(A1(t); A2(t+1)) > I(A2(t); A1(t+1)), A1 is leading and A2 is following.

Partial Information Decomposition (PID)

Decomposes joint information into:

Unique — what each agent contributes alone.
Redundant — what multiple agents contribute independently.
Synergistic — what emerges only from combination.

I(A1, A2; Outcome) = Unique(A1) + Unique(A2) + Redundant + Synergy

Synergy > 0 → emergent coordination (whole > sum of parts).
Synergy ≈ 0 → independent action.
Synergy < 0 → interference; agents are hurting each other.

Empirical findings

In a guessing-game study, GPT-4.1 pairs scored synergy 0.42 with 67% task success; Llama-3.1-8B pairs scored 0.18 with 52%. Synergy correlated with performance at r = 0.67 (p < 0.001).

Theory-of-mind prompting (“consider what your partner knows and doesn’t know”) raised synergy from 0.42 to 0.54 (+29%) and task success from 67% to 78% (+11%) on the same models.

Detecting role specialization

Cluster each agent’s behaviors over time. Agents that consistently occupy the same cluster (>70%) have specialized into a role — initiator, critic, synthesizer, validator — even when no role was assigned.

Practical use

Monitor synergy. Drops below threshold → trigger intervention (adjust prompts, change models, add or remove agents).
Optimize pairing. Test which agent combinations produce highest synergy on which task types.
Decide when to add or remove agents. Low synergy + low performance → add a specialist. High redundancy + low synergy → remove a redundant agent.

def synergy(actions_a, actions_b, outcomes):
    mi = mutual_info(actions_a, actions_b, outcomes)
    ua = unique_info(actions_a, outcomes)
    ub = unique_info(actions_b, outcomes)
    red = redundant_info(actions_a, actions_b, outcomes)
    return mi - (ua + ub + red)

Limits

Cost. Information-theoretic measures require tracking all agent states; expensive at scale.
Causality. Synergy correlates with performance but doesn’t prove causation.
Scale. Most studies use 2–5 agents. Behavior at 10+ is open.
Norms. Synergy isn’t automatically good — agents could synergize toward the wrong goal.

Commune

Explorer