Engineering

Harnesses That Hire Other Harnesses

The next bottleneck in AI engineering is not model capability, it is delegation between harnesses. A senior harness plans, cheap harnesses execute in isolation, and the senior verifies before accepting the work back. Here is the pattern we run in production.

INFINITEWARE EngineeringJuly 3, 20268 min read
Harnesses That Hire Other Harnesses

The previous article on this blog was about a trust problem inside a single agent. This one is about the version of that problem that shows up the moment you connect two of them.

Not two models. Two harnesses. A harness is the whole runtime around a model: tools, sandbox, memory, cost policy, verification hooks, the shell it runs in. The model is what thinks. The harness is what acts. That distinction is doing more work in production AI today than any specific model choice.

The interesting move of 2026 is not that harnesses have gotten smarter, though they have. It is that harnesses have started hiring each other. A senior harness plans a piece of work, hands the execution to a cheaper harness running on a fixed subscription, then verifies what came back before accepting it. If you have ever run a team of engineers, this will look familiar. That is not an accident. It is the same problem.

SENIOR · TRUSTEDJUNIOR · ISOLATEDSENIOR · SKEPTICPLANNERsenior harnessplans · reviewsEXECUTOR 1sandbox · fixed budgetEXECUTOR 2sandbox · fixed budgetEXECUTOR 3sandbox · fixed budgetVERIFIERsenior harnessrejects by defaultapproved · rejected · feedback
The senior harness plans and reviews. Junior harnesses execute in isolation. The verifier gates what comes back.

Three roles, one runtime

In our production setup at INFINITEWARE, every long-running task ends up decomposed into three roles. The important thing is that these are roles inside a harness org, not different models. The same underlying model can play any of them. The differentiation is what the harness gives it: tools, memory, sandbox, budget.

  • The planner decomposes the work, writes an explicit brief for each sub-task, and holds the trust. This role stays on your most capable model. It is where taste lives.
  • The executor is deliberately narrow. It gets a sandbox, a single brief, no memory of the wider mission, and a hard budget. This is where you push work to the cheapest capable harness. In 2026 that usually means a subscription-plan coding harness or a fast small model with tools.
  • The verifier is a skeptic. Its job is to reject. Same model as the planner, different prompt, different context, different tools. It does not care about the brief, only about what came back and whether it holds up.

The rule we hold is that no output crosses back into shared state until the verifier signs off. Not a soft nudge, a hard gate. Silent partial success is the failure mode that eats projects when this rule is optional.

Why this is not just prompt chaining

If you have only ever built agents by chaining prompts through the same model, this reads like a heavier version of that. It is not. Prompt chaining shares one context, one memory, one identity. A harness org does the opposite: each harness runs in its own process, its own sandbox, its own tool inventory, its own budget. They cannot see each other's context by default. They talk through a narrow interface, exactly the way engineers on a team talk through tickets and code review rather than by reading each other's minds.

The value of the separation is not architectural elegance. It is that when the junior harness goes off the rails, and it will, the damage is bounded by design. It cannot rewrite files outside its sandbox, cannot spend past its budget, cannot lie about what it did, because the verifier is not reading its notes, it is reading its output.

ISOLATION BOUNDARYSENIOR HARNESSJUNIOR HARNESS1 · PLANwrite the brief2 · DELEGATEhand off to junior3 · EXECUTEwork in sandbox4 · VERIFYread the diff
One turn of the loop. The planner delegates, the executor works in isolation, the verifier gates the merge.

Trust boundaries: the part nobody talks about

The pattern falls apart if you get the trust boundaries wrong. There are four we tune for every delegation, and getting one of them wrong is enough to make the whole setup unsafe.

  • Filesystem. The junior harness runs against a fresh git worktree, not the main checkout. It cannot see files it was not given, cannot damage files it was not asked to touch, and its worktree is auto-deleted if it made no changes worth keeping.
  • Network. Executors get a whitelist, not the open internet. Package registries yes, arbitrary POST endpoints no. The planner decides what an executor is allowed to reach.
  • Secrets. The junior never sees a credential it did not strictly need. The planner substitutes references for real secrets in the brief, and the harness resolves them at execution time.
  • Output authority. The junior can propose. It cannot commit, push, deploy, message a human, or spend money. All of those live in the planner or a dedicated action harness, gated by the verifier's sign-off.

A harness is only useful in proportion to what you dare let it change. Widen that circle without widening verification and you have not built delegation, you have built a bug at scale.

How we run this in production

The concrete instance of this pattern that ships at INFINITEWARE looks like this. A senior harness on our Claude subscription plans and dispatches work. For bulk execution the plan is handed to a cheap coding harness running on a fixed-cost subscription, which does the actual file edits inside a disposable worktree. The senior harness comes back on the other side and reads the diff, runs the tests, and either accepts, rejects with feedback, or rejects hard. On a good day we spend one senior turn planning, one cheap-tier subscription turn executing, and one senior turn verifying. On a bad day the verifier catches something ugly, the loop repeats, and we still spend less than we would have on a single senior-only run.

COST TIERS PER TURNPLANsenior harnesscostper-tokenlatencysecondsTIER 1EXECUTEsubscription-plan harnesscostflat monthlylatencyseconds to minutesTIER 2VERIFYsenior harnesscostper-tokenlatencysecondsTIER 3most work spends fixed-cost minutes in the middle tiersenior tiers charge per-token, only on plan and verify
Cost tiers per turn. Plan and verify are per-token on a senior harness. Execute is fixed-cost on a subscription-plan junior.

The economics of this shift matter more than the architecture. The subscription-plan executor tier is a fixed monthly cost. Once you have paid for it, additional work through it is effectively free at the margin. If your senior harness can trust the output, that changes what work is worth attempting. Refactors that were too expensive to justify become routine. Sweeps across a large codebase stop being a scary quarterly project and become a Tuesday.

Where it breaks

We have watched this pattern fail in four ways enough times to name them:

  • Silent partial success. The junior returns a change that looks correct and passes a shallow test but misses the point of the brief. Only a verifier that reads the diff against the original intent, not just the diff on its own, catches this.
  • Verifier drift. Verifiers prompted only to 'check the work' tend to rubber-stamp. The prompt has to instruct the verifier to try to refute the change. Default answer: rejected. Only overturn on evidence.
  • Cost runaway on retry. A poorly bounded loop can call the senior verifier six times on the same rejected work. Every delegation needs a max-retries cap, and every failure needs to be actionable, not 'please try again'.
  • Planner over-decomposition. The temptation is to split every task into as many sub-tasks as possible. Real work has coupling. Splitting a change into fifteen tickets when it should have been three creates an integration problem the verifier is not equipped to solve.

The employee analogy, taken seriously

The pattern is called the employee analogy not because it is a cute framing, but because every hard-won lesson from managing engineering teams maps onto it. Junior engineers who cannot be trusted with production access get sandboxed environments. Reviews are gates, not suggestions. Senior time is expensive, so you spend it on planning and review, not on typing. New hires get narrow tasks with clear briefs, not open-ended missions. Bad performers get fewer briefs, not the same brief with more pep talk.

The harnesses we run now are three years old in the newest case and six months old in others. In one direction the loop is fast enough that a senior harness delegates the work of an entire afternoon to a junior in under a minute. In the other direction, the junior can still burn cost on a task it never had a chance of finishing correctly. Which is exactly what happens with humans. The lesson from the engineering-team version was to give the junior an out: mark it stuck, escalate it back, do not hide it. We are doing the same thing with the harnesses now.

If you are starting from scratch on this pattern, do not build the org. Build one delegation. One planner. One executor. One verifier. Hard trust boundaries. One loop. Get it working end to end. Only then start adding lanes. Complexity is the enemy of trust, and trust is still the whole game.

Written by

INFINITEWARE Engineering

We are a Bahrain-based AI company shipping sovereign, on-premise systems for government, finance, energy, and legal across the GCC since 2008. Forty-plus clients. Sixteen products in production. We write here when we have something specific worth sharing from the work.

Have a workflow like this?

Let's talk about shipping it into production.

Contact us