Editor's Note: As AI agents become cheaper and easier to invoke, software development is entering a new phase: the question is no longer whether more agents can be launched, but whether humans still have enough attention to manage, judge, and merge their outputs.
This article introduces a thought-provoking concept - the "Orchestration Tax." The cost of starting an agent is very low, requiring only a prompt or a click; but the real expense lies in the subsequent steps: checking whether the results are correct, understanding their impact on the system architecture, handling conflicts between different agents, and ultimately deciding which code can enter the main branch. These tasks cannot be easily parallelized and still rely on the same serial resource: human judgment.
The author likens developers to the "GIL" in an AI agent system, the single-threaded lock that limits the overall throughput of the system. Multiple agents can run simultaneously, but once they reach the architecture judgment, code review, and conflict resolution stage, they must pass through the developer's brain again. Therefore, the more agents there are, it does not necessarily mean higher output; it may just make the review queue longer, leading developers into more frequent context switching and cognitive fatigue.
This is also a point that is easily overlooked in the current AI programming tools trend: the sense of efficiency and actual productivity are not always the same thing. A dashboard full of running agents can create the illusion of "high productivity"; but if developers do not truly understand, review, and integrate these changes, the system may accumulate technical debt and cognitive load instead of productivity.
Thus, this article truly discusses not "how to use more agents," but "how to redesign workflows around human attention." In the age of agents, the key ability is not just asking questions and assigning tasks, but knowing which tasks can be handed over to machines for parallel processing, which tasks must be reserved for human judgment; when to batch review, when to stop orchestrating, and refocus on a core issue.
AI is expanding the concurrent capacity of software production, but human attention remains the scarcest and most irreplaceable resource in the system. A truly mature agent workflow does not throw all tasks to machines, but, like designing a production system, carefully designs its own attention architecture.
The following is the original text:
Now, starting more AI Agents has become very easy. But running more Agents does not mean that "you" have multiplied. Your cognitive bandwidth cannot be parallelized. All the judgment truly used to guide them, assess the results, and merge the judgments must still ultimately go through the same serial processor—yourself.
The so-called "Coordination Tax" is essentially the cost you pay after forgetting this point. And the only true solution is, as with designing any concurrent system, to start designing your own attention.
I previously participated in a roundtable at Google I/O with Richard Seroter, Aja Hammerly, and Ciera Jaspan, discussing the current state of software engineering and how it might evolve. As we neared the end, Richard asked us: What is the one thing developers should take away and change the most after listening?

I mentioned a point that I have been contemplating for the past few months: feeling busy does not equate to actually being productive. You can be running 20 Agents simultaneously and feel incredibly busy. But that does not mean you have delivered the workload of 20 Agents.
Earlier in that conversation, Richard gave a name to this issue. He said, "What you just mentioned is actually the Coordination Tax. You cannot successfully manage 20 Agents in your own mind."
He was absolutely right. I want to unpack this concept more fully, as it is not a matter of self-discipline but rather one of architecture.
There was a statement I made almost offhandedly in that roundtable that has since stuck with me: Running multiple Agents does not mean there is another you in the world.
The Asymmetry People Don't Account For
There is a hidden asymmetry in the workflow of an Agent.
Starting an Agent is very cheap. You just need to press a key, or write a Prompt. But closing the loop on an Agent is not cheap at all. Someone has to check if its results are correct and reharmonize them with the changes made by other Agents.
This someone is you. And you are only one.
Last month, I wrote about part of this issue in "The Limit on Your Parallel Agents," mainly discussing that ambient anxiety: you don't know which parallel thread is quietly failing. This article aims to discuss the structure behind this cost.
When you start to view Agent development as a concurrent system, you will realize that humans themselves are just a component of this system. A very slow serial component.
You are the Single-Threaded Resource
If you have written concurrent code, you already have an intuition for this issue. You just applied that intuition incorrectly in the past.
Python has a Global Interpreter Lock, known as the GIL. You can create as many threads as you want, but only one thread can execute Python bytecode at a time because they all need to acquire this lock first.
You are the GIL of your AI Agent.
They can all run simultaneously. But as soon as their work requires a true understanding of the system architecture or needs to resolve a merge conflict, they must first acquire this lock. And this lock is singular, held by you.
Amdahl's Law articulates this very precisely: the speedup limit from parallelization depends on the part of the work that still must be completed serially. If there is a large portion of your process that cannot be parallelized, then no matter how many cores you invest, you will ultimately hit a hard limit.
In Agent development, this serial part is the judgment.
Launching 8 agents will not speed up your judgment time. It will only make the queue of tasks waiting for you longer.
This is a very old fact in performance engineering, but many people are still surprised by it: optimizing non-bottleneck parts does not increase overall throughput. You are just piling up more unfinished work in front of the bottleneck.
Adding agents optimizes the part that was not originally a constraint. The real constraint is the review stage, and the entire system's throughput happens to be equal to this stage's throughput.
Orchestration tax is the structural gap between Agent production capacity and the content you can actually merge. It occurs when you let a single-threaded resource manage a concurrent system.
Brute Force Cannot Solve Structural Limits
At that roundtable, I said a sentence: I have never felt my tools are so efficient as I do now, but I have also never felt so exhausted.
Both of these feelings are completely real, and they come from the same source.
This exhaustion has a very specific origin: it is the feeling of continuously pushing a serial processor to 100% utilization without any slack.
Every time you look back at an Agent that has left your field of attention, you incur a context switching cost. You have to clear your mind and then reload a different context from scratch.
The CPU can do this in microseconds, yet architects still try to avoid frequent switches. While you may take minutes to accomplish it, and never fully restore the context perfectly.
Having 5 Agents is not a 1x workload repeated 5 times. It is 5 cold-start contextual reloads, plus a background brain process constantly worrying about which Agent you should be checking on now.
You can't "work harder" to overcome a systemic constraint. This tax must always be paid.
If you try to power through, it will eventually manifest in either a shallowing of code reviews, or you entering a state of "cognitive surrender" — because forming your own judgments is too attentionally expensive, you might as well just accept the code written by the Agent.
You either consciously pay this tax or allow it to erode your understanding of your own system in the shadows.
Design Your Attention Like You Design Systems
So, you must treat your attention as a scarce serial resource.
You wouldn't design a distributed system without considering bottlenecks. So, please treat your brain with the same respect.
Here are some methods that have proven truly effective for me:
Scale your Agent team based on review capacity, not UI capacity.
A good concurrent system uses backpressure to prevent an ever-growing queue. Producers need to slow down to match consumer processing power.
Your Agent count is the producer, and your review capacity is the consumer. The right parallel Agent count should be the number of code reviews you can truly focus on. For most people, this is usually just a low single-digit number.
AI tools would gladly have you start 20 Agents, but that's just a UI feature — it doesn't mean you actually have the capacity to manage them.
Classify Tasks.
When Richard asked how I handle this, I mentioned this method. I split tasks into two piles.
The first category is relatively independent work that I'm willing to delegate to an Agent running in the cloud backend. These tasks can be executed asynchronously and usually only require me to do a final check.
The second category is complex tasks where the real work is the judgment itself. For example, a very strange bug or an architectural design decision.
The biggest mistake is trying to parallelize the second category of tasks. Parallel processing multiple complex tasks will not increase your output; it will only cause contention for that lock repeatedly, ultimately resulting in worsened outcomes.
Bulk Review.
Each context switch incurs a high cost. Sitting down to review the results of 4 Agents at once is much cheaper than looking at one, doing something else, then cold-starting to look at another.
Give the Agents a longer leash. Let the work accumulate a bit and then process them as a batch.
Only Use That Lock for Judgment.
Do not waste your brain on things that machines can validate on their own. Have the Agents write tests that pass or generate screenshots.
Let them prove the 80% mundane but verifiable parts themselves. This way, your scarce attention only needs to focus on the 20% that truly requires human judgment.
Protect Your Serial Time.
The bottleneck needs your best time, not the leftover piecemeal time between multiple Agent checks.
Sometimes, the highest-leverage action is actually to completely stop orchestrating: shut down that Agent-filled computer, focus solely on thinking about a single problem, and hold that lock tightly throughout the entire process.
Orchestration is not the real work. It is merely the overhead generated around the work.
Aja points out that architecting is now the most urgent skill: you need to know what tasks belong in an Agent and what tasks are too big for it.
I'd like to add one more point: you are also a component in this system. Your attention has a known, very low serial throughput. The system either respects this number, or it will quietly lower your standards to circumvent it.
Being busy does not equal being productive.
This is crucial because this failure mode is almost invisible to you personally.
Having 20 active Agents running gives you a sense of "maximum productivity." The dashboard is full, everything is moving. However, this feeling has been decoupled from actually merging high-quality code into the main branch.
You can be busy to the limit, yet have almost no real output. From an internal experience perspective, these two are almost identical.
Ciera mentioned Margaret-Anne Storey's research on debt. We talked about technical debt and also about cognitive debt.
Operating without paying the orchestration tax will have you accumulate both of these debts simultaneously.
You are merging things you haven't read carefully. Your mental model of the codebase is completely outdated. These issues won't show up on the dashboard today. They will manifest themselves when the system fails in production—then you look at the system and suddenly realize you have no idea how it actually works.
So, the real conclusion is: Starting an Agent is not a capability. Anyone can run 20.
The real capability is designing a system around that serial resource that cannot be cloned, cannot be parallelized.
This resource is your attention.
Design it as you would any critical piece in a production environment dependency.
Welcome to join the official BlockBeats community:
Telegram Subscription Group: https://t.me/theblockbeats
Telegram Discussion Group: https://t.me/BlockBeats_App
Official Twitter Account: https://twitter.com/BlockBeatsAsia
