Multi-agent systems: topologies, tradeoffs, and when they actually help

Multi-agent systems are the loudest phrase in AI engineering right now, and most of the time a single agent or a plain workflow is the better call. This guide defines what a multi-agent system is, lays out the topologies you can build, and is honest about the cost, latency, and failure modes that decide whether splitting the work pays off.

By Kanika Mathur, Head of Service Delivery

Reviewed by Resourcifi engineeringPublished May 27, 2026Updated May 27, 202612 min read

Key takeaways

The short version

A multi-agent system is an architecture where several autonomous agents, each with its own role, instructions, and tools, coordinate to complete a goal that is hard for one agent alone. In LLM systems each agent is usually a language model given a scoped objective and the ability to use tools and talk to other agents.
The common topologies are orchestrator-worker, sequential, hierarchical, parallel, debate, and network. Orchestrator-worker, where a lead agent delegates to parallel workers, is the most common production pattern.
The honest headline is cost. Anthropic reports that agents use about 4x the tokens of a chat, and multi-agent systems about 15x, and that token usage alone explained around 80% of performance variance in one of their evals.
Multi-agent does not always win. It underperforms on tightly interdependent tasks (Anthropic names coding), adds latency through coordination, and compounds errors. A UC Berkeley study found many failures are design failures, not model failures.
The decision rule from Anthropic: find the simplest solution that works and add agents only when they demonstrably improve outcomes. Reach for multi-agent when the task is high-value, parallelizable, and exceeds one context window. Otherwise a single agent or a deterministic workflow is cheaper and easier to operate.

What a multi-agent system is, versus a single agent

A multi-agent system is an architecture in which multiple autonomous agents, each with its own role, instructions, and tools, coordinate to accomplish a goal that would be hard for any single agent to complete alone. In the LLM era each agent is typically a language model given a scoped objective, a toolset, and the ability to communicate with other agents or a coordinator. Multi-agent systems are a specialization of AI agents, so the agent fundamentals carry over before any coordination is added.

The term predates LLMs. In classical distributed AI, Michael Wooldridge defines an agent as a computer system situated in an environment that is capable of flexible autonomous action to meet its design objectives, and a multi-agent system as a collection of such interacting agents that coordinate toward goals.⁵ The LLM version keeps that shape and swaps in language models as the reasoning core.

The contrast that matters for a buyer is simple. A single agent is one LLM in a loop: it reasons, calls tools, observes the result, and repeats until the task is done, with all state in one context window. A multi-agent system distributes the work across several agents. Anthropic frames the canonical structure as orchestrator-worker, where a lead agent coordinates the process while delegating to specialized subagents that operate in parallel.¹ Single-agent keeps everything in one reasoning thread, which is simpler, cheaper, and easier to debug. Multi-agent buys parallelism and separation of concerns at the cost of more tokens, more coordination, and more failure surface.

One nuance avoids a common overclaim. Anthropic separates workflows, where LLMs and tools are orchestrated through predefined code paths, from agents, where the model dynamically directs its own process and tool use.² Many systems sold as multi-agent are really multi-step workflows, which is often the better and more predictable choice. Multiple LLM calls do not by themselves make a system agentic. Teams that want the deterministic version of this often start with AI workflow automation and add autonomous agents only when the workflow ceiling is reached.

Architectures and topologies

The common multi-agent topologies are orchestrator-worker (a supervisor delegates to parallel workers), sequential or pipeline (agents run in a fixed order), hierarchical (supervisors of supervisors), parallel (independent agents by section or vote), debate or critic (agents critique each other to improve reasoning), and network (decentralized many-to-many handoffs). A useful way to group them: orchestrator-worker, sequential, and hierarchical are the controlled family where you keep determinism, while debate and network are the emergent family where the model keeps control.

Each topology trades coordination, parallelism, and cost differently. The table below is the practical comparison, drawn from Anthropic's pattern writeups, the LangGraph multi-agent concepts, and the multi-agent debate research.¹²⁷³

Multi-agent topologies compared

Six common structures, how they coordinate, and where each one earns its cost. Orchestrator-worker is the most common production pattern; sequential is the cheapest and most predictable.

Multi-agent topology comparison
Topology	Coordination	Parallelism	Cost and latency	Best for	Watch-out
Orchestrator-worker	Central lead delegates	High	High	Breadth-first research, routing to specialists	Lead-agent prompt quality decides everything
Sequential / pipeline	Fixed order, each consumes the last output	None	Low	Decomposable, predictable processes	Errors propagate forward
Hierarchical	Supervisors of supervisors	High	Highest	Large task trees, organization-shaped problems	Latency compounds at every layer
Parallel (section or vote)	Independent, then aggregate	Highest	High	Independent subtasks, self-consistency	Only helps if subtasks are truly independent
Debate / critic	Agents critique across rounds	Medium	High	Hard reasoning, factuality-sensitive answers	Large token cost for modest gains
Network / decentralized	Many-to-many handoffs, no central lead	Variable	Hard to bound	Open-ended negotiation, unknown path	Loops and miscoordination, hardest to debug

Sources: Anthropic, Building Effective Agents and How We Built Our Multi-Agent Research System; LangGraph multi-agent concepts; Du et al. (2023) on multi-agent debate.

Two of these deserve a note. Orchestrator-worker is what Anthropic's research feature runs: a lead agent plans, spins up three to five subagents in parallel, and a separate pass handles citations and synthesis.¹ Debate has the clearest research backing for quality gains: Du et al. show that having multiple agents propose and critique answers across a few rounds improves factual accuracy and arithmetic reasoning over a single-agent baseline, and that adding agents or rounds helps further.³ The caveat is that debate spends a lot of tokens for those gains, so it is justified mainly on genuinely hard reasoning.

How agents coordinate

Agents coordinate through four primitives: roles (each agent gets a scoped persona, objective, and tools), handoffs (one agent transfers control of the task to another), agents-as-tools (a manager calls a specialist like a function and keeps control), and shared state or message passing (agents read and write a common state object, or talk in messages). The teachable distinction is handoffs versus agents-as-tools: a handoff passes the mic, agents-as-tools keeps it.

The OpenAI Agents SDK makes that last distinction concrete. A handoff is exposed to the model as a tool call such as transfer_to_refund_agent, after which the receiving agent owns the next part of the interaction; use it when routing itself is part of the workflow. Agents-as-tools lets a manager agent call a specialist that returns a result without taking over the user-facing conversation, which fits a bounded subtask.⁶ For shared state, LangGraph models the whole system as a graph of nodes that pass a common state object between them, with a supervisor maintaining global state and dispatching subtasks.⁷ For message passing, AutoGen's building block is a conversable agent that initiates and replies to messages, though centralizing that through a group chat manager can become a coordination bottleneck.⁹

The single most actionable coordination lesson is about delegation quality. Anthropic found the orchestrator must give each subagent an objective, an output format, guidance on the tools and sources to use, and clear task boundaries, otherwise subagents duplicate work or leave gaps.¹ Vague delegation is the fastest way to turn a multi-agent system into an expensive way to get a worse answer. Building these coordination layers in production is exactly the work our AI agent development team does.

When multi-agent helps, and when it does not

Multi-agent systems help on valuable tasks that involve heavy parallelization, information that exceeds a single context window, and interfacing with many complex tools. They do not help on tightly interdependent work, low-value tasks where the roughly 15x token cost is not justified, or anything latency-sensitive. The decision rule from Anthropic is to find the simplest solution that works and add agents only when they demonstrably improve outcomes.

Start with the honest cost figure, because it reframes the whole decision. Anthropic states plainly that agents typically use about 4x more tokens than chat interactions, and multi-agent systems use about 15x more tokens than chats.¹ In one browsing-agent eval they found that token usage by itself explained about 80% of the performance variance, meaning a large share of multi-agent's advantage is simply spending more compute on the problem. On a fixed budget, that changes the math.

Single agent versus multi-agent system

The tradeoff that should drive the default. Use the right column only when the task is high-value, parallelizable, and exceeds one context window.

Single agent versus multi-agent tradeoff
Dimension	Single agent	Multi-agent system
Token cost	Baseline, about 4x a chat for an agent	About 15x a chat
Latency	Lower	Higher, from coordination overhead
Context	One window	Distributed across agents
Best tasks	Linear, interdependent, low to mid value	Parallelizable, breadth-first, high value
Debuggability	Easier, one trace	Harder, distributed traces
Error surface	Contained	Compounds across agents
Coding suitability	Better today	Underperforms, tasks are interdependent
When to choose	The default	Only when it demonstrably improves outcomes

Source: Anthropic, How We Built Our Multi-Agent Research System (token and suitability figures) and Building Effective Agents (latency and cost framing).

The cases where multi-agent wins are specific: breadth-first work that requires exploring many independent paths at once, context that exceeds one window so each agent gets its own budget, and tasks that touch many heterogeneous tools where specialization keeps each agent's tool surface manageable. On its own internal research evaluation, Anthropic reports a multi-agent system outperformed a single-agent Claude Opus 4 baseline by 90.2%, though that is an internal eval and not a universal benchmark.¹

The cases where it loses are just as specific. Tightly coupled tasks that share context or have many dependencies between agents underperform, and Anthropic names coding directly because subtasks are highly interdependent and current systems cannot coordinate the real-time changes well enough yet.¹ Agentic systems trade latency and cost for performance, and every added agent adds a place for errors to start and propagate.² The practical rule is to prefer the simplest thing that works: reach for multi-agent when the task is high-value, parallelizable, and exceeds one context window, and reach for a single agent or a deterministic workflow for everything else.

Frameworks and challenges

The widely used 2026 frameworks are LangGraph (a graph with shared state for maximum control), CrewAI (role-based crews plus deterministic flows), AutoGen (conversational message passing), and the OpenAI Agents SDK (lightweight handoffs and agents-as-tools). They are convenience layers over the same underlying topologies, so choose by how much control you need and how your team models the problem, not by the framework name. The biggest challenge is not the model but the system design around it.

Each framework has a native mental model. LangGraph gives you explicit control of control-flow and context as a graph of nodes with shared state, suited to production systems that need determinism.⁷ CrewAI is built on roles: you define each agent's role, goal, and backstory, assemble them into a crew, and use flows when you want a predictable pipeline.⁸ AutoGen centers on agents that converse and negotiate toward a result.⁹ The OpenAI Agents SDK offers the lightest primitives, with the clean handoff versus agents-as-tools split and built-in tracing.⁶ Because all of them implement the same underlying topologies, the framework is a convenience layer; the architecture is what actually determines outcomes.

The challenges are well documented, and most are design problems. A UC Berkeley-led study of more than 200 tasks across seven popular frameworks built the first Multi-Agent System Failure Taxonomy, with 14 failure modes in three categories: system and specification design, inter-agent misalignment, and task verification or termination. Its headline finding is that many failures stem from poor system design rather than model performance, with agents operating on incorrect assumptions, ignoring peer input, or failing to verify outputs.⁴ Beyond that taxonomy, the recurring practical challenges are the 15x token cost, latency and coordination overhead, error propagation and loops in decentralized topologies, and the observability burden of following distributed traces, which is why frameworks ship tracing and explicit graph state in the first place.

Frequently asked

Multi-agent systems questions

What is a multi-agent system in AI?

A multi-agent system is an architecture where several autonomous AI agents, each with its own role, instructions, and tools, coordinate to complete a task that is hard for one agent alone. In modern LLM systems each agent is usually a language model given a scoped objective and the ability to use tools and communicate with other agents or a central coordinator. The concept predates LLMs and comes from classical distributed AI, where a multi-agent system is a collection of interacting autonomous agents that coordinate toward a goal.

What is the difference between single-agent and multi-agent systems?

A single agent runs one LLM in a reason-and-act loop with all state in one context window, which is simpler, cheaper, and easier to debug. A multi-agent system splits the work across specialized agents to gain parallelism and separation of concerns, but it uses far more tokens, with Anthropic reporting about 15x the tokens of a normal chat, and it adds coordination and failure surface. Use multi-agent for high-value, parallelizable tasks, and use a single agent or a deterministic workflow for everything else.

What are the main multi-agent architectures or topologies?

The common topologies are orchestrator-worker, where a supervisor delegates to parallel workers, sequential or pipeline, where agents run in a fixed order, hierarchical, which stacks supervisors of supervisors, parallel, where independent agents work by section or vote, debate or critic, where agents critique each other to improve reasoning, and network, which is decentralized many-to-many handoffs. Orchestrator-worker is the most common production pattern, used in systems like Anthropic's research feature.

When should you not use a multi-agent system?

Avoid a multi-agent system for tightly interdependent tasks, where Anthropic specifically names coding, for low-value tasks where the roughly 15x token cost is not justified, and when latency matters, since coordination adds wall-clock time. A single agent or a predefined workflow is usually the better default. Add agents only when they demonstrably improve outcomes, because more agents and more steps mean more places for errors to start and propagate.

What frameworks are used to build multi-agent systems?

The most widely used 2026 frameworks are LangGraph, which gives a graph with shared state and maximum control, CrewAI, which uses role-based crews and deterministic flows, AutoGen, which is built on conversational message passing, and the OpenAI Agents SDK, which offers lightweight handoffs and agents-as-tools. They are convenience layers over the same underlying topologies, so the deciding factor is how much control you need and how your team models the problem.

Kanika Mathur

Head of Service Delivery, Resourcifi

Kanika Mathur is Head of Service Delivery at Resourcifi, where her engineering pods build orchestrator-worker and pipeline agent systems in production and have learned the hard way where a second agent earns its token bill and where it just adds latency and failure surface. The honest tradeoff framing in this guide reflects what those teams ship, scope, and sometimes talk a client out of.

Resourcifi on LinkedIn →