Is the Prompt Outdated? AI Programming is Shifting to Loop Engineering

Bitsfull2026/06/10 13:436475

概要:

An Overview of the Recent Hit, Loop Engineering


Editor's Note: The use of AI coding Agents is transitioning from "humans manually writing prompts and progressing tasks" to "humans designing loops to continuously schedule Agents." The concept of Loop Engineering, as mentioned by Addy Osmani, revolves around building a workflow that can automatically discover tasks, assign tasks, check results, record progress, and decide on the next steps.


This loop is roughly composed of five modules: Automations (timely task discovery and triage), Worktrees (isolating multiple parallel development environments), Skills (accumulating project knowledge and team conventions), Plugins/Connectors (integrating with real tools like GitHub, Linear, Slack, databases), and Sub-agents (separating executors and reviewers), along with an external memory layer such as Markdown files or a Linear board to maintain status and progress.


The article highlights that the significance of Loop Engineering is not just "letting AI run more rounds" but embedding an engineer's judgment into the system's design. The loop can significantly amplify a developer's leverage but will not replace validation, understanding, and judgment. The real risk lies not in using the loop but in treating it as an excuse to avoid comprehending code and systems. The key future capability of AI programming collaboration may no longer be just writing a good prompt but designing a reliable, verifiable, and sustainable Agent workflow.


Below is the original text:


Loop engineering is replacing your role as the "person who writes prompts for the AI." You are to design a system that allows this system to prompt the AI on your behalf. The term "loop" here can be understood as a recursive goal: you define a purpose, and AI iterates until the task is completed. It is roughly composed of five components, and both Claude Code and Codex now possess these five components.


I believe this may be the way we interact with coding agents in the future. However, everything is still in its early stages, and I remain skeptical. You absolutely need to be cautious about token costs, as the cost difference can be significant in different usage patterns, especially depending on whether you are "token-rich" or "token-poor." You also need some mechanism to ensure that quality does not degrade. Concerns about "AI garbage output" (slop) are also valid. Having said that, let's take a look at what this is all about.


@steipete recently said, "You shouldn't be writing prompts for your coding AI anymore. You should design some loops to guide your AI. Similarly, Claude Code lead @bcherny at Anthropic also said, "I don't prompt Claude anymore. I have a bunch of loops running that will prompt Claude and decide what to do next on their own. My job is to write loops."


So, what does this mean exactly?


Over the past couple of years, if you wanted your coding AI to do something, the basic approach was to write a good prompt and provide enough context. You input a sentence, read the returned result, then input the next sentence. The AI is a tool, and you have always been holding this tool, pushing it round after round. This phase has to a certain extent ended, or at least some people think it is coming to an end.


Now, what you are building is a small system: it will discover work on its own, assign tasks, check results, record progress, and then decide what to do next. In other words, you are letting this system drive the AI, rather than personally prompting it over and over. I've previously written about its "cousin" - agent harness engineering, which is setting up the operating environment for a single AI; and the factory model, which is building a software system. Loop engineering sits one layer above the harness. It is like a harness but runs on a timer, generates little helpers, and self-feeds.


What surprises me is that this is no longer just a "tooling" issue now. A year ago, if you wanted a loop, you had to write a ton of bash scripts and then maintain that stack of scripts forever. That was your own thing, and it belonged only to you. Now, these components are directly built into the product. The capabilities listed by Steinberger can almost all be mapped to the Codex app and almost equally to Claude Code. Once you realize they are the same in form, you will no longer be stuck on which tool to use, but will design a loop: regardless of which tool you are sitting in, it will continue to run.


Five Components and Some Explanations


A loop requires five things, plus a place to store information. I'll list them first and then match them one by one.


First, Automations: triggered on schedule, automatically discover and route.


Second, Worktrees: Allows two parallel working agents to not step on each other's toes.


Third, Skills: Document project knowledge to avoid the agent relying on guesswork every time.


Fourth, Plugins and Connectors: Enable the agent to plug into the tools you are already using.


Fifth, Sub-agents: One responsible for proposing solutions, the other for inspecting solutions.


And then there is the sixth thing: memory. It could be a Markdown file, a Linear board, or any place separate from a single conversation that can hold "completed items" and "next steps." It sounds so simple that it might not seem critical, but this is the same set of tricks every long-running agent depends on. I've written in detail about this in long-running agents: the model forgets between each run, so memory must be on disk, not in the context. The agent will forget, but the codebase won't.


Now, both products have these five components.



Their naming might differ in some places, but the capabilities are fundamentally the same. Below, I explain them one by one because, to be honest, whether a loop stabilizes in the end or quietly leaks everywhere, the key is in the details.


Automations: The Heartbeat of the Loop


Automations are what truly make the loop a loop, not just a one-time task you've run manually once. In the Codex app, you can create an automation task in the Automations tab, selecting the project, the prompt it should run on, the run frequency, and whether it runs in your local checkout or in the background worktree. Results of runs that find issues will go to the Triage inbox, while runs that find no issues will automatically archive, which is nice. Internally at OpenAI, it's also used for some mundane but necessary tasks like daily issue triage, summarizing CI failures, writing commit digests, and tracking bugs introduced last week. Automation tasks can also invoke a skill, so you can keep repeatable tasks maintainable: trigger $skill-name instead of pasting a wall of text instructions into a task that no one will update later.


Claude Code can achieve the same effect, but through a different path: it relies on scheduling and hooks. You can use /loop to run a prompt or command at a fixed interval, schedule a cron task, or trigger a shell command at certain points in the agent's lifecycle using hooks. If you want it to continue running after you close your computer, you can also push the whole setup to GitHub Actions. The idea is the same: you define a self-sufficient task, give it a rhythm, let the findings come to you instead of you having to check everywhere.


There is another in-session primitive worth understanding that is closer to the core of this article. /loop will run repeatedly at a rhythm; /goal, on the other hand, will continue execution until a condition you specify is met. After each round, a separate little model will determine if the task is complete, so the agent writing the code is not the one grading itself. You can give it a condition like "all tests in test/auth pass, and lint is clean" and then walk away. Codex also has a similar capability, also called /goal. It will work across rounds until a verifiable stop condition is met and supports pausing, resuming, and canceling. The same primitive, both tools have. This is essentially the pattern that keeps appearing in this article.


So, Automations are responsible for surfacing work. The rest of loop is responsible for handling this work.


Worktrees: Keeping Parallelism from Turning into Chaos


Once you are running more than one agent, file conflicts become a failure point. Two agents writing to the same file at the same time is essentially as troublesome as two engineers independently modifying the same line of code without communication. git worktree can solve this issue. It is a separate working directory on an independent branch, but shares the same codebase history, so one agent's changes are physically isolated from another agent's checkout.


Codex directly integrates worktree support, so multiple threads can work on the same repository simultaneously without colliding. Claude Code can also achieve the same isolation through git worktree: you can use the --worktree flag to open a session in an independent checkout, or set isolation: worktree on a subagent to give each helper a brand-new checkout that is automatically cleaned up after completion. I've written about this from the human side in the orchestration tax: worktrees can eliminate conflicts at the mechanical level, but you are still the bottleneck. The real decider of how many agents you can run simultaneously is not the tool, but your review bandwidth.


Skills: Let You Avoid Re-explaining the Project Every Time


Skill is a mechanism that allows you to avoid re-explaining the same project context every time, like a goldfish. The two tools use the same format: a folder with a SKILL.md file containing instructions and metadata; optionally, there can also be scripts, references, and resource files. Codex runs a skill when you use $ or /skills command, and it also runs automatically when your task matches the skill description. This is why a concise, simple description is often better than a clever, fancy one. Claude Code follows the same approach, and I have written about this pattern in agent skills.


Skills also help save your intent from being repeatedly consumed. As I mentioned in intent debt, the agent is cold-started at the beginning of each session, filling in blanks with confident guesses as long as there are blanks in your intent. A skill externalizes this intent: project conventions, build steps, "we don't do it this way because of that one incident in the past," and so on—all written in a place that the agent reads every time it runs. Without skills, the loop would have to deduce your entire project from scratch every round; with skills, it's more like compound interest.


One thing to clarify: a skill is a writing format, while a plugin is a distribution method. When you want to share a skill across multiple code repositories or bundle several skills together, you package them into a plugin. This applies to Codex and Claude Code alike.


Plugins and Connectors: Let Loop Engage with Your Actual Tools


A loop that can only see the file system is a very limited loop. Connectors, built on MCP, enable the agent to read your issue tracker, query databases, call staging APIs, or send messages in Slack. Both Codex and Claude Code support MCP, so a connector you write for one usually works in the other as well. Plugins combine connectors and skills, allowing your teammates to install a complete configuration at once, rather than reconstructing the entire setup from memory.


This is the difference between "an agent telling you 'this is the fix'" and "a loop opening its own PR, associating with a Linear ticket, and notifying the channel after CI passes." Connectors are crucial because they enable the loop to act in your actual environment, not just say, "I would do this if I could."


Sub-agents: Keeping the Maker Apart from the Checker


Within a loop, one of the most valuable structural designs is to separate the "writer" from the "checker." The writer of the code is often too lenient when grading their own homework. Having another agent with a different set of instructions, sometimes even using a different model, can catch issues that the first agent may have overlooked due to self-persuasion.


Codex only generates sub-agents when requested, which run in parallel and then merge the results back into one answer. You can define your own agents in TOML files in .codex/agents/: each agent has a name, description, instructions, and optionally a model and reasoning strength. Therefore, your security auditor could be a high-strength reasoning model, while your explorer could be a fast, read-only lightweight model. Claude Code also achieves a similar capability through sub-agents and agent teams in .claude/agents/, allowing multiple agents to pass work between each other. The most common division of labor on both sides is: one agent explores, one agent implements, and one agent validates against specifications.


I have made this point twice now: once in the code agent orchestra, and the other in adversarial code review. It is particularly crucial within a loop because the loop will run when you are not watching, so a verifier that you truly trust is the only reason you dare to leave. Sub-agents do indeed consume more tokens because each agent needs to make its own model calls and tool calls, so you should use them where "a second opinion is worth paying for." This is essentially also what /goal in Claude Code does at a lower level: it uses a new model to determine if the loop is complete, rather than having the model doing the work decide. In other words, it applies the separation of "maker" and "checker" to the stopping condition itself.


What a Loop Looks Like


Putting all these things together, a single thread will turn into a small control panel. Here is a structure I often use.


Every morning, an automation runs on the code repository. Its trigger words invoke a triage skill, reading yesterday's CI failures, open issues, recent commits, and writes the findings to a Markdown file or Linear board. For each issue worth addressing, the thread opens a separate worktree, assigns a sub-agent to draft a fix, and then assigns a second sub-agent to review this fix based on project skills and existing tests.


Connectors allow this loop to automatically open a PR and update the ticket. Anything the loop cannot handle will enter the triage inbox and be handed over to me. The state file is the backbone of the entire system: it remembers what has been attempted, what has succeeded, and what is still pending. Therefore, the morning run the next day will pick up where today's run left off.


Take note of what you have truly done. You merely designed it once. Those steps were not individually prompted by you. This is the real-world version of Steinberger's quote. Furthermore, the same loop can run on both Codex and Claude Code because the components themselves are part of the same set of components.


The Loop Still Won't Do the Work for You


The loop has changed how work is done but will not remove you from work. In fact, as the loop becomes stronger, three issues become more acute, rather than easier.


Validation still depends on you. A loop that runs unsupervised can also make unsupervised errors. The reason you separate the verifier sub-agent and maker is to give some meaning to the loop saying "completed." Nevertheless, "completed" is still a claim, not proof. I have been repeating the same phrase in "code review in the age of AI": your responsibility is to deliver code that you confirm is valid.


If you neglect it, your own understanding will still decay. The faster the loop delivers code you did not personally write, the greater the gap between what you actually understand and what truly exists in the system. This is comprehension debt. If you do not read what the loop produces, a smooth loop will only accelerate this debt growth.


Also, yes, the most comfortable posture is likely the most dangerous posture. When the loop can run itself, you are easily inclined to stop forming your own judgment and simply accept whatever it returns. I call this cognitive surrender. If you design the loop with discernment, it is the antidote; if you design the loop merely to avoid thinking, it is the accelerator. The same action yields completely opposite results.


Build the Loop, But Still Be an Engineer


I believe this foreshadows the evolution of our future work. That being said, if I do not personally review code or rely entirely on automated loops to fix code, my product quality will suffer. I am likely to fall into a downward spiral: continuously digging myself into deeper holes.


So, of course, you can go ahead and build your own loop, but don't forget, prompting your agent directly is still valid. The key is finding the right balance.


The outcome of the loop can also vary from person to person. Two individuals can build the exact same loop and yet get completely opposite results. One person uses it to accelerate in their deep understanding of work; another person uses it to avoid understanding the work itself. The loop doesn't know the difference between the two. You do.


This is why loop design is harder, not easier, than prompt engineering. Cherny's point isn't that the work has become easier but rather that the leverage point has shifted.


Build the loop. But build it like someone who still intends to be an engineer, not like someone just responsible for hitting the 'start' button.


[Original Article Link]



Welcome to join the official BlockBeats community:

Telegram Subscription Group: https://t.me/theblockbeats

Telegram Discussion Group: https://t.me/BlockBeats_App

Official Twitter Account: https://twitter.com/BlockBeatsAsia