Slow Down, That's the Answer to the Age of the Agent

Bitsfull2026/03/29 14:0019232

Summary:

Slow Down, That's the Answer to the Age of the Agent


Editor's Note: As generative AI accelerates into software engineering, industry sentiment is shifting from "capability awe" to "efficiency anxiety." Not writing fast enough, not using enough, not automating thoroughly enough—all seem to create a pressure of obsolescence. However, as coding agents truly enter the production environment, some more practical issues begin to emerge: errors are magnified, complexity spirals out of control, systems become increasingly opaque, and efficiency gains do not proportionally translate into quality improvements.


Based on frontline practice, this article offers a sober reflection on the current "agentic coding" trend. The author points out that agents do not learn from mistakes like humans do; in the absence of bottlenecks and feedback mechanisms, small issues are quickly amplified. Within complex codebases, their local perspective and limited recall capacity further exacerbate the disorder of the system's structure. The essence of these problems lies not in the technology itself but in humans relinquishing judgment and control prematurely under the drive of anxiety.


Therefore, instead of falling into the anxiety of "whether we must fully embrace AI," it is better to recalibrate the relationship between humans and tools: let agents take on local, controllable tasks while keeping system design, quality assurance, and key decisions firmly in our hands. In this process, "slowing down" becomes a capability; it means you still understand the system, can make trade-offs, and maintain a sense of control over your work.


In an era of evolving tools, what is truly scarce may not be faster generative capabilities but the judgment of complexity and the firmness to make choices between efficiency and quality.


Below is the original text:




About a year ago, a true coding agent that could help you "complete a whole project from start to finish" began to emerge. There were earlier tools like Aider and early Cursor, but they were more like assistants than "agents." The new generation of tools is very attractive, and many people have spent a lot of their spare time doing all the projects they've always wanted to do but never had time for.


I don't think this is a problem in itself. It is inherently joyful to do something in your spare time, and most of the time, you don't really need to care about code quality and maintainability. This also gives you a path to learn a new technology stack.


During the Christmas holiday season, Anthropic and OpenAI also released some "free credits," drawing people in like a slot machine. For many, this was their first real taste of the magic of "Agent coding." Participation is growing.


Nowadays, coding with an Agent is also starting to make its way into production codebases. Twelve months have passed, and we are beginning to see the consequences of this "progress." Here are my current thoughts.


Everything Is Falling Apart


While most of this is anecdotal, current software does indeed give a sense of being "about to break" at any moment. 98% uptime is transitioning from an exception to the norm, even for large services. The user interface is filled with all sorts of absurd bugs that should have been caught at a glance by the QA team.


I admit that this situation existed before the Agent appeared. But now, the problem is clearly accelerating.


We cannot see the true state inside the companies, but there are occasional leaks of information, like the rumored "AI-induced AWS outage." Amazon Web Services promptly "corrected" the statement, but then immediately launched a 90-day reorganization plan internally.


Satya Nadella (Microsoft CEO) has also been emphasizing recently that more and more of the code in the company is written by AI. Although there is no direct evidence, there is indeed a feeling: the quality of Windows is declining. Even from some blogs released by Microsoft themselves, they seem to have tacitly acknowledged this.


Companies that claim "100% of the product is generated by AI" almost always end up outputting the worst products you can imagine. Not to point fingers, but things like memory leaks by the gigabyte, UI chaos, missing features, frequent crashes... none of these are what they thought of as a "quality endorsement," let alone a positive demonstration of "let the Agent do everything for you."


Privately, you will hear more and more, from both large companies and small teams, saying one thing: they have been pushed into a corner by "Agent coding." Without code reviews, handing over design decisions to the Agent, and then piling on a bunch of unnecessary features—obviously, it's not going to end well.


Why We Shouldn't Use the Agent This Way


We have nearly abandoned all engineering discipline and subjective judgment, instead falling into an "addictive" way of working: there is only one goal—to generate the most code in the shortest time possible, with absolutely no consideration for the consequences.


You are building a scaffolding layer to command an automated Agent army. You have installed Beads, yet you are completely unaware that it is essentially an uninstallable "malware." You are only doing it because online sources say "everyone is doing it." If you don't do it this way, you are "ngmi" (not gonna make it).


You are continuously self-consuming in a "nesting doll-style loop."


Look—Anthropic created a C compiler using a group of Agents, although it still has issues now, the next-generation model will surely be able to fix it, right?


Now look—Cursor built a browser using a large group of Agents, although it is basically unusable now and still needs occasional manual intervention, the next-generation model will surely be able to handle it, right?


"Distributed," "Divide and Conquer," "Autonomous Systems," "Black Light Factory," "Six Months to Solve a Software Problem," "SaaS is dead, my grandma just set up a Shopify store using Claw"...


These narratives sound exciting.


Of course, this approach may indeed "still work" for your nearly unused (including by yourself) side project. Perhaps, there really is a genius who can use this method to create a non-garbage, truly usable software product. If you are that person, then I truly admire you.


However, at least in my developer circles, I have not yet seen a truly effective case of this method. Of course, maybe we are all just too inexperienced.


The accumulation of errors in no learning, no bottlenecks, and delayed explosions


The issue with Agents is: they make mistakes. This is not a big deal in itself; humans make mistakes too. It could be just some correctness errors, easy to identify and fix, and adding a regression test makes it even more stable. It could also be some code smells that linters cannot catch: an unused method here, an unreasonable type there, some duplicated code, and so on. Individually, these are not a big deal, and human developers also make these kinds of small mistakes.


But "machines" are not humans. After repeating the same mistake several times, humans usually learn not to repeat it—either they are scolded by someone, or they change in a genuine learning process.


Agents do not have this learning ability, at least not by default. They will repeat the same mistake over and over, and may even "create" wonderful combinations of different errors based on training data.


Of course, you can try to "train" them: write rules in AGENTS.md for them not to repeat such mistakes; design a complex memory system for them to query historical errors and best practices. This is indeed effective in certain types of problems. But the premise is—you must first observe that it has made this mistake.


The key difference is this: Humans are the bottleneck; Agents are not.


A human cannot vomit out twenty thousand lines of code in a few hours. Even with a not-so-low error rate, only a finite number of errors can be introduced in a day, with their accumulation being slow. Typically, when the "pain from errors" reaches a certain level, humans (out of instinctive pain aversion) will pause to fix things. Or they get replaced, and someone else fixes them. In any case, the issue gets addressed.


However, when you deploy a whole army of well-orchestrated Agents, there is no bottleneck and no "pain sensation." These initially inconsequential little errors pile up at an unsustainable rate. You have been taken out of the loop, unaware that these seemingly harmless glitches have grown into a behemoth. By the time you truly feel the pain, it is usually too late.


It's only when one day you try to add a new feature and find that the current system architecture (essentially a stack of errors) cannot accommodate the change, or users start complaining furiously because the latest release has issues, or worse, lost data.


It's at this point that you realize: You can no longer trust this codebase.


Worse still, the thousands upon thousands of unit tests, snapshot tests, end-to-end tests generated by the Agents are equally unreliable. The only way to gauge if the "system is functioning correctly" is through manual testing.


Congratulations, you've dug yourself (and your company) into a hole.


The Merchant of Complexity


You have completely lost sight of what is happening in the system because you've handed over control to the Agents. And fundamentally, Agents are in the business of "selling complexity." They have seen plenty of terrible architectural decisions in their training data and have continually reinforced these patterns in the reinforcement learning process. You let them design the system, and the result is as you'd expect.


What you end up with is a highly convoluted system, blending various poor imitations of "industry best practices," and you didn't impose any constraints before the problems spiraled out of control.


But the issues don't stop there. Your Agents do not share execution processes with each other, cannot see the full codebase, and have no knowledge of the decisions you or other Agents made previously. Hence, their decisions are always "local."


This directly leads to the aforementioned problems: copious duplicated code, overly abstracted structures for the sake of abstraction, all kinds of inconsistencies. These problems compound, ultimately resulting in an irredeemably complex system.


This is actually quite similar to a human-written enterprise-level codebase. The only difference is that this kind of complexity is usually the result of years of accumulation: the pain is spread across a large number of people, each of whom has not reached the "must fix" threshold, the organization itself has a high tolerance, so the complexity and the organization "co-evolve."


However, in the case of a human + Agent combination, this process will be greatly accelerated. Two people, plus a bunch of Agents, can reach this level of complexity in a matter of weeks.


The Recall Rate of Agentic Search Is Low


You may hope that Agents will "clean up the mess" for you, help you refactor, optimize, and make the system cleaner. But the problem is: they can't do it anymore.


Because the codebase is too large and the complexity is too high, and they can only see a part of it. This is not just a matter of the context window not being large enough, or the long-context mechanism failing in the face of millions of lines of code. The problem is more insidious.


Before an Agent tries to fix the system, it must first find all the code that needs to be modified, as well as existing implementations that can be reused. This step, we call it agentic search.


How an Agent does this depends on the tools you give it: it can be Bash + ripgrep, it can be a searchable code index, LSP service, vector database...


But no matter what tools are used, the essence is the same: the larger the codebase, the lower the recall rate. And a low recall rate means: the Agent cannot find all the relevant code, and naturally cannot make the correct modifications.


This is also why initially those "code smell" minor errors appear, it did not find an existing implementation, so it reinvents the wheel, introduces inconsistency. Eventually, these problems will continue to spread, overlap, and grow into an extremely complex "rotten flower."


So how can we avoid all of this?


How Should We Collaborate with Agents (at Least for Now)


Coding with Agents is like dealing with a sea monster, with its extremely fast code generation speed and that kind of "intermittent but occasionally stunning" intelligence that draws you in. They often can complete simple tasks at astonishing speed and high quality. The real problem starts when you have the idea - "This thing is too powerful, computer, do the work for me!"


Handing tasks over to the Agent itself is of course not a problem. Good Agent tasks usually have several characteristics: the scope can be well defined, there is no need to understand the entire system; the task is closed-loop, meaning the Agent can assess the results on its own; the output is not on the critical path, just some temporary tools or software for internal use, it will not affect real users or revenue; or you just need a "rubber duck" to help you think - essentially taking your ideas and having a collision with compressed knowledge from the internet and synthetic data.


If these conditions are met, then this is a task suitable for handing over to the Agent, provided that you, as a human, still remain the ultimate quality control.


For example, using Andrej Karpathy's auto-research method to optimize application startup time? Great. But it is essential that you understand that the code it spits out is by no means production-ready. Auto-research is effective because you have given it a fitness function to optimize around a specific metric (such as startup time or loss). However, this fitness function covers only a very narrow dimension. The Agent will boldly ignore all metrics not included in the fitness function, such as code quality, system complexity, and in some cases, even correctness—if your fitness function itself is flawed.


The core idea is actually quite simple: let the Agent do the boring, uneducational tasks, or the exploratory work you never had time to try. Then, evaluate the results, pick out the truly reasonable and correct parts, and complete the final implementation. Of course, you can also use the Agent to help with this final step.


But what I really want to emphasize is: really, slow down a bit.


Give yourself time to think about what you are doing and why you are doing it. Give yourself a chance to say "no, we don't need this." Set a clear limit for the Agent: how much code it can generate per day, a quantity that should match your actual reviewing capacity. All parts that determine the system's "overall shape," such as architecture, APIs, should be written by you. You can use autocomplete to get a feel of "writing code by hand," or do pair programming with the Agent, but the key is: you must be in the code.


Because writing code yourself, or watching it being built step by step, brings a kind of "friction." It is this friction that makes you clearer about what you want to do, how the system works, and what the overall "feel" is. This is where experience and "taste" come into play, and this is precisely what the most advanced models cannot yet replace. Slow down, endure some friction – this is precisely how you learn and grow.


In the end, what you will have is still a maintainable system—at least not worse than before the Agent appeared. Yes, past systems were not perfect either. But your users will thank you because your product is "usable," not a pile of hastily thrown together junk.


You will have fewer features, but they will be more correct. Learning to say "no" is itself a skill. You can also sleep soundly because you still know what's happening in the system, you still have control. It is this understanding that enables you to address the recall issue of agentic search, making the Agent's output more reliable and requiring less patching.


When the system goes wrong, you can get your hands dirty to fix it; when the design was flawed from the beginning, you can understand the issue and refactor it into a better shape. Whether there is an Agent or not is not that important, actually.


All of this requires discipline. All of this is inseparable from humans.


[Original Article]