Ben Redmond

Blog

One Agent Isn't Enough

Agentic coding has a problem - variance. What if single-agent runs are leaving performance on the table by design?

Due to the stochastic nature of LLMs, each agent run has slight variations. Even with the same context, one session with an agent might land near the "peak" of where we could expect it to (rolling a 20 on a d20), and another session might land somewhere in the middle of the narrowed probability curve (rolling a 10 instead).

In Part 1 I talked about my mental model around context engineering: the goal of context engineering is to shift the probability distribution of LLM responses, where the "probability distribution" is the space of all possible results from the LLM.

In this piece, I'll talk about extending that mental model to adjust for the fact that we can easily and (relatively) cheaply trigger parallel agent runs.

The goal of context engineering is more than just decreasing standard deviation and improving the mean quality of the probability distribution - it's also to ensure reliable convergence to a local maximum.

The Limitations of Context Engineering

Using context engineering best practices (prompt engineering, reference to relevant documentation, targeted addition of relevant tools, skills, etc.) to shift the probability distribution handles the first-order problem: reduce the likelihood of bad outcomes, and raise the floor - improving the average quality of responses.

But it doesn't solve the exploration problem. Even within the improved distribution, there are multiple paths the agent can take to solve the problem. Some are decent. Others are optimal. A few will make you want to slam your head into the keyboard. A single agent run picks one path. You don't know if it's the peak, or just high enough satisfy you.

The Second Mental Model

SOLUTION LANDSCAPE

Single Agent:                        Parallel Agents + Synthesis:

   ▲                                    ▲
   │      ╱╲                            │      ╱╲
   │     ╱  ╲    ╱╲                     │     ╱  ╲     ╱╲
   │    ╱    ╲  ╱  ╲                    │    ╱ ②  ╲  ╱   ╲
   │   ╱      ╲╱    ╲   ╱╲              │   ╱       ╲╱  ④ ╲   ╱╲
   │  ╱    ①         ╲╱   ╲            │  ╱   ①            ╲╱   ╲
   │ ╱                      ╲           │ ╱        ③    ⑤        ╲
   └────────────────────────────►       └────────────────────────────►

   You get what you get               Synthesizer picks ② as winner

The solution to the problem is parallel agents (and lots of tokens).

With parallel agents, we take a "sample" multiple times (i.e. multiple runs of the same / similar prompt), explore different peaks, and use the findings from the group to converge on the best solution. We're able to hedge our bets, and using the knowledge of the crowd, consistently get more insight out of the LLM.

Why does this work? I'm not making $100M at Meta as an AI researcher so I can't answer - but I'll do my best to speculate.

Multiple Samples This is the main one that I've been mentioning. Five agents = five independent samples. You're not relying on a single path and some luck to find the peak.

A single agent run might settle on a suboptimal solution - a local minimum that works but isn't great. It found something functional and stopped exploring. Parallel agents with independent starting points can escape these traps. They explore different regions of the problem space, pushing past mediocre solutions to find better ones. The convergence pattern reveals when multiple paths lead to the same superior approach.

Different Starting Points Clean context windows mean no anchoring bias. Each agent explores from a fresh perspective.

Validation Through Repetition When two agents independently suggest the same approach, that's evidence it's a local maximum. When all agents diverge, you need more constraints.

The parallel structure transforms coding agents from sampling from a single random draw into being a guided search for peaks.

How I Use Parallel Convergence

I use parallel convergence primarily in two ways / phases that fall into workflows of

  1. Generating multiple solutions to a problem
  2. Gathering information from multiple sources about a problem

Here's how it works:

Note that I primarily use Claude Code, which supports subagents via an orchestrator pattern, meaning that one main agent spawns subagents, and later synthesizes the results

PARALLEL CONVERGENCE WORKFLOW

         Phase 1                    Phase 2
         GATHER                     SOLVE

         A   B   C                  X   Y   Z
         │   │   │                  │   │   │
         └───┼───┘                  └───┼───┘
             ▼                          ▼
         synthesize ──── plan ───▶ synthesize ──▶ execute

Generate multiple solutions to a problem

In this workflow, I'll use multiple agents to come up solutions to the same problem. The goal here is to de-risk the fact that any one agent may come up with a sub-par solution.

When spinning up the subagents, Claude may assign them different angles to approach the problem at, allowing main Claude to explore more of the problem space.

For example, if I'm debugging why a modal renders behind everything despite z-index: 9999 (we've all been there), Claude might approach the problem from a data flow, React hooks, and component layering perspectives.

Claude then synthesizes, validates, and proposes a solution based on the outputs from all subagents. If 3/5 subagents came up with a similar solution, then it is more likely that this solution is what we want, and we should move forward with it.

I most commonly use this in debugging cases, but it's also been useful in the planning phase of a more complicated task.

Gather information about a problem

As part of my planning workflow in Claude Code, I dispatch multiple intelligence-gathering subagents. Here are some examples of the agents I'll use:

  • Agent A: scan git history (what patterns exist?)
  • Agent B: search local documentation (what's been tried before?)
  • Agent C: map code paths (what interfaces are available?)
  • Agent D: analyze test coverage (what validation already exists?)
  • Agent E: identify constraints (what are the boundaries?)
  • Agent F: find risks (what do we need to watch out for?)
  • Agent G: web research (what do online resources say?)

Yes, seven agents is excessive. It is! I won't unleash all seven (that's chaos) - but having the full menu available matters.

Each explores independently. Each has a chance to discover different information about the problem approaching it from a slightly different perspective. Different goal here: while in the previous one, we are dispatching agents to discover solutions to the same problem, in this case, we are dispatching agents to find distinct, but complementary information related to the problem.

With this information in hand (and context sufficiently primed), Claude can then proceed with making a plan for solving the problem. (If you want to double down on parallelism, you can also use parallel agents for planning).

What Convergence Looks Like

Here's an example from an "AI hedge fund" project I'm working on for model evaluation.

The problem: the AI could articulate detailed failure modes (good), but claimed Sharpe ratios that would make Renaissance Technologies jealous (bad). It had the form of institutional risk documentation without the calibration of realistic return expectations. I needed to update the prompts to address this case.

I launched 4 parallel intelligence-gathering agents:

  • Agent A (Intelligence Gatherer): Found similar past tasks related to commits adding calibration to edge scoring
  • Agent B (Extracting Patterns from Codebase): Found the same edge scoring calibration pattern in the codebase, noted it was "proven effective"
  • Agent C (Git Historian): Found the exact same commit history, described 3 "calibration improvement waves" over 7 weeks
  • Agent D (Web Researcher): Found Ken French Data Library and AQR research with actual factor premium numbers (momentum: 5-8% annually, quality: 2-4%)

All four agents, exploring from completely different angles (pattern database, codebase analysis, git history, web research), converged on the same solution: add calibration guidance using the existing Anti-Patterns section format, grounded in historical factor data.

Even better: Agent A initially suggested a 60-line dedicated section. But when Claude synthesized all the findings, the convergence pattern showed a simpler path - a 5-line addition to the existing Anti-Patterns section would achieve the same goal without context bloat.

How does Claude actually synthesize? Honestly, I don't control that directly it's part of Claude Code's orchestrator pattern. But I can see what happens: it weights agreement heavily, surfaces outliers worth considering, and, critically, tends toward simpler solutions when convergence supports it (with additional prompting pushing for simplicity). That's how a 60-line suggestion became 5.

The convergence told me two things:

  1. The solution was validated (4 independent explorations → similar conclusions)
  2. The minimal version was sufficient

The cost was ~10 minutes of parallel agent time, and maybe 200k tokens total. My Claude Code usage limits weep, but the payoff was a high-confidence solution with evidence from multiple independent sources, plus the discipline to keep it simple.

If this sounds excessive for a 5 line change, it is! That's kind of the point.

Even for a ~5 line prompt change, it was worth grounding those 5 lines in past decisions, web research, and agent consensus.

When NOT to Use This

The multi-agent approach doesn't come without its drawbacks:

  • Token use
  • Context bloating in main agent from the additional information
  • Time waiting for agents

A single agent is largely sufficient for well-defined tasks, simple changes, or easy bugs. In other cases, I start to consider using the parallel workflow.

From Random Walk to Guided Convergence

WHAT CONVERGENCE TELLS YOU

     Agents Agree                     Agents Diverge
     ─────────────                    ───────────────

       "caching"                        "caching"
       "caching"                        "rewrite DB"
       "caching"                        "add index"
           │                                │
           ▼                                ▼
        EXECUTE                     1. Tighten constraints
                                    2. Ask user for opinions on path forward

Part 1's model: better context engineering → better distribution → better average outcomes

In this model: better context → better distribution → parallel exploration → convergence validation → optimal outcomes reliably

To summarize: Context engineering creates the right distribution. Parallel convergence finds the peaks within it.

Next Steps

Next time you're debugging a tricky issue, spin up 3 parallel agents, and see if they're able to find something that 1 agent alone couldn't.

This probably sounds more systematic than it felt. In practice, it's a lot of "let's see what happens"! If you have learnings from a similar workflow, or thoughts on the article, reach out to me - I'd love to discuss what you're doing!

— Ben

Part 2 of a series on context engineering and building with AI coding agents. Part 1 introduced probability distributions and information architecture. In subsequent pieces, I'll go into specifics on my workflow, and what I've learned from building KOUCAI.

About Me

Software engineer at MongoDB, building koucai.chat on the side. I write about what works (and what breaks) when working with AI.