Custom AI Agents

Part 5 — Orchestration

When do you actually need multiple agents — and what does a well-structured system look like?

10 min · Updated June 2026

It is tempting to architect everything as a swarm of specialised agents collaborating. Resist it.

5.1

My instinct is to build a team of agents — is that actually the right call?

15×

more tokens than a single conversation — multi-agent systems only make economic sense when the task value justifies the cost and the work is genuinely parallelisable.

Anthropic published analysis

Anthropic’s own published analysis is blunt: multi-agent systems use roughly 15× more tokensthan a single chat, so they only make economic sense when the task’s value is high enough to justify that, and the work is genuinely parallelisable or too large for one context window.

The decision tree:

1.Can a workflow — predefined steps — solve it? Do that. Cheapest, most reliable, most auditable.
2.If not, can a single agent with good tools and context management solve it? Do that.
3.Only if the task is genuinely parallel, exceeds a single context window, or spans many complex tool domains should you reach for multi-agent.

Most teams skip straight to step three. That is the mistake.

5.2

Before any code: the one mechanic that makes multi-agent systems work

Every pattern below is a different way of arranging agents, but they all rely on the same underlying mechanic, and if you understand this one thing the rest follows: agents don’t pass messages to each other directly — they read and write shared session state. One agent writes its result to a named key; the next agent reads that key. That’s it. The “orchestration” is just the wiring that decides who runs when, and the state is how their work actually connects.

In ADK this mechanic has two halves. An agent publishes its output to a key with output_key, and a later agent consumes it by referencing {key} in its instruction:

shared_state_example.py

from google.adk.agents import LlmAgent, SequentialAgent

# Agent A WRITES its answer to state["capital_city"].
agent_a = LlmAgent(
    name="Finder",
    model="gemini-flash-latest",
    instruction="Find the capital of France. Respond with just the city name.",
    output_key="capital_city",          # ← publishes result to shared state
)

# Agent B READS that same key. {capital_city} is substituted before the model sees it.
agent_b = LlmAgent(
    name="Describer",
    model="gemini-flash-latest",
    instruction="Tell me three facts about {capital_city}.",   # ← consumes it
)

pipeline = SequentialAgent(name="CityInfo", sub_agents=[agent_a, agent_b])
# Runs A → B. A writes "Paris" to state; B's instruction becomes "...about Paris".

Hold that output_key → {key} handoff in your head. Every pattern that follows is a variation on who writes and reads, and in what order the runner fires them.

5.3

When I do go multi-agent, what shapes actually work — and how do I build each one?

When you do go multi-agent, the field has converged on five recurring shapes. Below each is the ADK primitive that implements it, so the pattern isn’t just a picture — it’s something you can build.

Supervisor (orchestrator-workers)

The 2026 default. One orchestrator agent owns the overall task and full context; it spins up ephemeral, isolated worker sub-agents for sub-tasks, each of which returns a compressed summary. This works because it combines a single point of coherent control with clean context isolation.

There are two ways to build this in ADK, and the difference matters. With sub_agents, the orchestrator can delegate— hand the whole turn to a specialist and let it respond to the user:

supervisor_delegation.py

from google.adk.agents import LlmAgent

billing_agent = LlmAgent(name="Billing", model="gemini-flash-latest",
    description="Handles billing, invoices, and refunds.",
    instruction="Resolve the customer's billing question.")

tech_agent = LlmAgent(name="TechSupport", model="gemini-flash-latest",
    description="Handles technical troubleshooting.",
    instruction="Diagnose and resolve the technical issue.")

# The supervisor delegates the whole turn to whichever specialist fits.
supervisor = LlmAgent(
    name="Supervisor",
    model="gemini-flash-latest",
    instruction="You coordinate a support team. Delegate billing questions to "
                "'Billing' and technical questions to 'TechSupport'. The 'description' "
                "of each sub-agent tells you what it handles.",
    sub_agents=[billing_agent, tech_agent],   # ← delegation targets
)

With AgentTool, the orchestrator instead calls a worker like a function, gets its result back, and stays in control of the conversation — this is the truer “orchestrator-workers” shape, because the supervisor never hands over the wheel:

supervisor_worker_as_tool.py

from google.adk.tools import agent_tool

research_worker = LlmAgent(name="Researcher", model="gemini-flash-latest",
    description="Researches a single narrow question and returns a short summary.",
    instruction="Research the question in state and return a 3-sentence summary.")

# The supervisor calls the worker as a tool and keeps ownership of the task.
supervisor = LlmAgent(
    name="Orchestrator",
    model="gemini-flash-latest",
    instruction="Break the user's request into sub-questions. Use the Researcher "
                "tool for each, then synthesise the answers yourself.",
    tools=[agent_tool.AgentTool(agent=research_worker)],   # ← worker-as-tool
)

The rule of thumb: use sub_agents when a specialist should take over (support routing), and AgentTool when the supervisor must stay in charge and combine results (research, planning). This is the single most important distinction on the page for building a supervisor system correctly.

Pipeline (sequential)

Staged refinement, where each agent’s output feeds the next — research → screen → schedule. Predictable and easy to reason about. This is SequentialAgent, and the state handoff is exactly the output_key → {key} mechanic from above:

pipeline.py

extract = LlmAgent(name="Extract", model="gemini-flash-latest",
    instruction="Extract the key clauses from the contract.", output_key="clauses")
analyse = LlmAgent(name="Analyse", model="gemini-flash-latest",
    instruction="Flag risky clauses in: {clauses}", output_key="risks")
draft   = LlmAgent(name="Draft", model="gemini-flash-latest",
    instruction="Draft redline suggestions for: {risks}")

contract_pipeline = SequentialAgent(name="ContractReview",
    sub_agents=[extract, analyse, draft])   # runs strictly in order, state flows forward

Fan-out (parallel)

Runs independent branches simultaneously and then merges them. The hard requirement is that the branches must be genuinely independent; if they need to coordinate mid-flight, this pattern breaks. In ADK this is ParallelAgent for the concurrent branches, usually wrapped in a SequentialAgentso a final agent can merge the results — note how the merger reads both keys the parallel branches wrote:

fan_out.py

from google.adk.agents import ParallelAgent

credit  = LlmAgent(name="Credit", model="gemini-flash-latest",
    instruction="Pull the applicant's credit summary.", output_key="credit")
fraud   = LlmAgent(name="Fraud", model="gemini-flash-latest",
    instruction="Run the fraud checks.", output_key="fraud")

# Branches run concurrently — only valid because credit and fraud don't depend on each other.
gather  = ParallelAgent(name="Checks", sub_agents=[credit, fraud])

decide  = LlmAgent(name="Decide", model="gemini-flash-latest",
    instruction="Approve or decline using credit={credit} and fraud={fraud}.")

underwriting = SequentialAgent(name="Underwriting", sub_agents=[gather, decide])
# gather runs both checks at once; decide waits for both, then reads both keys.

Debate

Has two agents argue a question and a third judge. Surprisingly effective for hard, subjective decisions, and cheap to wire up. Structurally it’s a fan-out of two opposing agents feeding a judge — a ParallelAgent (the two arguers writing pro and con) inside a SequentialAgent whose final agent reads both and rules.

Swarm

Uses peer-to-peer agents with shared state and no fixed hierarchy. Powerful but hard to control. Reserve it for back-office work, almost never for a customer-facing journey. There’s no single clean primitive for this — it’s the case where you accept custom control flow — which is itself a signal of how much harder it is to keep on the rails.

5.4

How do I stop an agent from grading its own homework?

A particularly useful specialisation of the supervisor pattern for long-running tasks is to separate the agent that plans, the agent that does the work, and a separate agent that judges the result. Separating the doer from the judge measurably reduces the “graded its own homework” failure — where an agent confidently rates its own bad output as good — which matters enormously for subjective outputs like legal drafting or financial commentary.

The “generate, then a differentagent judges, then loop if not good enough” shape is LoopAgent in ADK. The key detail is how the loop knows to stop: a sub-agent signals completion by escalating (or calling an exit tool), and max_iterationsis the safety cap so a never-satisfied evaluator can’t run forever:

loop_agent.py

from google.adk.agents import LoopAgent, LlmAgent
from google.adk.tools import exit_loop   # evaluator calls this when quality passes

generator = LlmAgent(name="Generator", model="gemini-flash-latest",
    instruction="Write or revise the legal summary. If feedback exists, address it: "
                "{feedback}", output_key="draft")

evaluator = LlmAgent(name="Evaluator", model="gemini-flash-latest",
    instruction="Critique the draft: {draft}. If it meets the bar, call exit_loop. "
                "Otherwise write specific feedback.",
    output_key="feedback",
    tools=[exit_loop])                    # ← the judge, separate from the doer, ends the loop

refine_loop = LoopAgent(
    name="DraftRefine",
    max_iterations=3,                     # ← hard cap: never loop forever
    sub_agents=[generator, evaluator],    # generate → judge → (maybe) again
)

The planner sits in front of this as a third agent that decomposes the task first; wrap planner → refine_loop in a SequentialAgentand you have the full planner/generator/evaluator trio, with the doer and judge structurally separated so the judge can’t rubber-stamp its own work.

5.5

Inside a single agent, what patterns should I be composing?

Independent of how many agents you have, each agent’s internal behaviour draws on a small set of patterns from the canonical Anthropic taxonomy:

Prompt chaining— break a task into sequential steps. (In ADK: a SequentialAgent, or just an ordered instruction.)
Routing— classify the input, then dispatch to the right handler. This is hugely underused; many problems that get built as agents are really routing problems. (In ADK: an LlmAgent with sub_agents and clear descriptions — the same delegation code shown for the supervisor.)
Parallelisation— split into independent subtasks, or run the same task several times and vote. (In ADK: ParallelAgent.)
Evaluator-optimizer— generate, critique with a separate evaluator, refine in a loop. Essential for high-stakes drafting. (In ADK: LoopAgent, exactly as above.)
ReAct— the baseline reason → act → observe loop. (This is what a single LlmAgentwith tools already does — it’s the loop from Part 1.)
Reflection— add a self-critique step; raises accuracy at the cost of latency. (A minimal LoopAgent, or a second instruction pass.)

You compose these. A contract-review agent might route by contract type, chain through extraction → analysis → drafting, and run an evaluator-optimizer loop on the final language. None of this is exotic; it is deliberate composition of simple patterns.

Concretely, that contract-review agent is the pieces above, nested: an LlmAgent router picks the contract type and delegates to a type-specific SequentialAgent(extract → analyse → draft), whose final drafting step is itself a LoopAgent (generate → evaluate). Three primitives — routing, sequential, loop — composed into one system. That compositional structure, not any single clever agent, is what a well-built orchestration layer actually looks like.

Found this useful?

Libraries and frameworks referenced on this page

Google Agent Development Kit (ADK) — google-adk (Python) — used for every pattern. The core mechanic is shared session state: an agent publishes with output_key and a later agent consumes via {key} substitution in its instruction. Pattern-to-primitive mapping: pipeline → SequentialAgent; fan-out/parallelisation → ParallelAgent (usually nested in a SequentialAgent for the merge step); evaluator-optimizer/reflection → LoopAgent (terminated by a sub-agent escalating or calling exit_loop, capped by max_iterations); supervisor → either sub_agents (delegation — a specialist takes over the turn) or AgentTool (worker-as-function — the orchestrator stays in control); routing → LlmAgent with sub_agents and descriptive description fields. Model shown throughout: gemini-flash-latest.
Anthropic “Building Effective Agents” taxonomy — the source of the within-agent pattern names (prompt chaining, routing, parallelisation, evaluator-optimizer, ReAct, reflection) and the 15×-token multi-agent cost figure. Cited conceptually; the patterns are framework-neutral and the ADK primitives above are one concrete way to implement them.