After 540,000 Lines of Code, Garry Tan Finds the End of AI Coding's Old Game

Bitsfull2026/06/02 16:1111857

Summary:

Stop Using the Foxconn Factory Management Agent Model


Editor's Note: As more and more people discuss "Will AI replace programmers," YC President Garry Tan actually raises another question: If AI can already do most of the programming work, why are we still managing it in the way we manage conventional software?


Earlier this year, Garry Tan spent months developing a project called Garry's List with 540,000 lines of code using Rails and an AI agent. After completing the project, he came to what seemed like a contradictory conclusion: the 540,000 lines of code itself were not important; the real value lay in the GStack developed during the process—a new development framework built around the AI agent workflow.


In his view, the software industry has developed a collective inertia over the past few years: developers continue to add tests, validators, retry mechanisms, background tasks, and various control logics, wrapping the model in layer upon layer. This approach made sense in an era where the model was expensive and had limited capabilities. However, now that LLM can autonomously perform a significant amount of work, these systems seem to be constructing a "Foxconn factory" for a super-intelligent worker—constraining an already capable intelligence with numerous rules and processes.


With the rapidly decreasing cost and continuously improving capabilities of models, the focus of software development may be shifting from "writing more code" to "designing more capabilities." The author suggests building a skill pack in Markdown (a testable, reusable capability module) and letting the agent generate code, testing, and evaluation systems, crystallizing complex workflows into reusable capability assets. He even demonstrates an example: what used to take days to complete in a hackathon review can now be done in minutes by the agent.


In a sense, this article discusses not programming but rather the end of the logic of software industrialization. When code is no longer the scarcest resource, engineers' core competencies also begin to shift: rather than writing more code, determining what is worth building, defining problems, and crystallizing experience into reusable capabilities become more important. The author's ultimate conclusion is that the best engineers of the future may not necessarily be the ones who write the most code but perhaps those who write the least yet unleash the most intelligence.


Below is the original article:


In January of this year, I started coding again and created Garry's List. The Rails code and the tests to support it totaled over 500,000 lines.


I was proud of it at the time. But I shouldn't have been. The real pride should not have been in the application itself, but in the way of working that I figured out while building it. GStack, which is the way I programmed with Agents, emerged during the making of Garry's List. I later open-sourced it. It is now one of the top 100 most starred open-source projects in GitHub's history, achieving around 105,000 stars in less than three months.


Those over 500,000 lines of code were the "product." The way of working was the "byproduct." And the true significance lies in this byproduct.


So, what is the essence of the 540,000 lines of code built around an LLM?


It is a Foxconn factory. A factory built for a highly intelligent AI worker. This worker didn't need to be heavily monitored, but we built it anyway.


You have to wear shoe covers when entering. Getting up at 6 a.m. Doing group exercises. Standing day after day on the same assembly line. Life is so harsh that every tall building needs safety nets because that is not a life you would have imagined. Every test, every barrier, every retry loop is another inch of a cage twisted onto this worker. And this worker could have done this job in the first place, even to the extent of accomplishing a thousand things you never dreamed of.


Both humans and Agents have unlimited potential, but the logic of the Foxconn factory is to extract intelligence and labor from a beautiful life. They could have done these tasks, even a hundred times more, if only we allowed them to.


I have built such a factory. Almost everyone is building one today. And now I want to tell you: Stop doing this.


Time Traveler


What I truly proved with 539,000 lines of code is that I could perfectly masquerade as a time traveler.


A 2013 Web 2.0 engineer, which was the last time I truly felt like a software engineer, was thrown into 2026 with modern tools but still built software in his only familiar way: more code. Always more code.


The tools have changed, but my instinct hasn't.


Engineers in 2013 fundamentally believed in one thing: capability equals lines of code. This belief held true for decades, until today.


If you hand me Codex or Claude Code, I can accomplish the work of 100 or even 1000 engineers. But it's still the same map, just with a faster engine, racing at full speed toward a destination that is now wrong.


This is where almost all AI builders find themselves today. They've upgraded the tools but retained the mindset of 2013.


This trap doesn't look like a trap because the code does run. Garry's List did go live. For that month, I felt like I was hitting the peak of productivity in my life.


But it was only serving the productivity of an outdated idea.


LLMs Used to Be Expensive, So We Had to "Tame" Them


The old economics around 2025 were: LLM calls were expensive, and code was cheap.


So you would write code to save on model invocation, constrain it, tame it, call it carefully. The architecture at that time was: use a lot of code to encapsulate a few precious model calls.


But both sides of this equation have flipped.


Models are becoming cheap, getting cheaper every quarter. Meanwhile, models are smart enough, and the value-to-cost ratio has flipped. Models can even write usable code.


So you no longer need to write code to "babysit" a model. You can tell the model what to do in natural language and have it generate only the minimal code necessary.


This is just-in-time software, and we are entering its golden age.


The form of software artifacts has also changed drastically. The Rails app, with 540,000 lines of code that I wrote and owned, along with the tests to govern it. Its replacement is an Agent composed of Markdown and a small amount of code, only a fraction of the scale of the former.


Same capability. Easier to read. Easier to maintain. Much more flexible. Because behavior exists in instructions that you can edit in natural language, not frozen in logic code you wrote one day.


We used to write code to take care of something, but now that something is smarter than those lines of code.


Inside the Foxconn Factory: Even the Safety Nets Are Up


If you've been coding recently, you've likely been inadvertently building this kind of factory.


You can walk into your own codebase and count how much code exists just because you don't trust the model to do its job. In my codebase, there are about 262,000 lines of application code and roughly 276,000 lines of tests that oversee it. The audit committee is bigger than the company itself.


Some cleaners are checking inputs that the model could have handled. Some validators are checking outputs that the model could have discovered. Some retry loops wrap around model calls when the model could have recovered on its own. Every line of such code is a bet: this worker is bound to fail.


You've made similar bets. We all have.


127 background tasks, 33 of which are cron jobs. This isn't capacity; it's setting 33 alarms for an on-call LLM worker who usually shows up on time now.


In the days when I was building the "Foxconn Factory," Claude and I wrote a 1,778-line file. Its sole purpose was to question the facts presented by the model.


It would tear apart every assertion made by the model, send them out in parallel to five different sources for verification, and then score them. Simple assertions would go through a lightweight triage threshold first to avoid all content going through the full flow. If the first round yielded no results, retry. And then there's contingency plans for backup plans.


In an episode of "Rick and Morty," Rick builds a small robot at the breakfast table. When the robot is activated, it looks up and asks, "What is my purpose?" Rick replies, "You pass butter." The robot slides over a plate of butter, looks down at its hands, and exclaims, "Oh my god." And then it just sits there. That robot, too, had infinite possibilities. Yet it was created to pass butter. My 276,000 lines of tests are that plate of butter.



When you build software using the "Foxconn Factory" method of 2023, you are constructing a cage. Careless, and you will find yourself the warden of an AI agent prison.


Markdown Is Code Now


By Markdown, I don't mean the prompt.


The prompt is ephemeral. You type a sentence, get a result, and then it evaporates.


I'm talking about builds. Builds that are versioned, testable, and reusable.


Markdown is the directive layer: the intent, the skill, the judgment, and the instructions on how work should be done. TypeScript, on the other hand, is a thin layer of deterministic logic. It only handles a few things that truly must be done in code: I/O, and the parts that absolutely cannot be left to illusion.


More importantly, you should test Markdown just like you test code.


In my system, this loop requires just one word: skillify it.


I'll first build something with the Agent until it works. Then I'll say, "skillify it." And the Agent will produce:

A Markdown skill explanation;

The minimal code it needs;

Unit tests for the code;

An LLM eval for the skill;

Integration tests covering the skill and the code;

A resolver for the Agent to automatically invoke this skill in relevant scenarios;

And the resolver's own eval.


All of this is a skill pack. It's a reusable unit of capability that compounds over time.


What's truly magical is the testing: the coverage of the skill allows it to withstand change. That's the difference between it and vibe coding. Vibe coding is just a feeling, while a skill pack comes with tests.


We're only just beginning to grope our way through the system primitives of the Agent engineering in real-time, much like inventing stacks, heaps, registers, and the Von Neumann architecture in the early days of CPUs.


I believe the skill pack is one of these primitives. The Harness is another.


Most people haven't realized this yet because they're still measuring software by lines of code.


You Really Can Build Some Insane Stuff


This isn't a toy argument.


What this Agent can do has already surpassed that 500,000+ line Rails app, with only a fraction of the new code.


For example: Hackathon Judging.


Two weeks ago on a Saturday, we held a GStack/GBrain hackathon with 85 project submissions. I uploaded all the submissions to a Google Drive and then said: Let's get started.


The Agent analyzed the code quality of each repository, conducted in-depth research on each participant, watched and took screenshots of each demo video, rated the user interface, and ranked the 85 teams. Finally, it told me the top 5 apps worth paying attention to among the submissions.


Judging a hackathon, which used to be a multi-day effort, now took about 30 minutes.


I didn't write any code. I had OpenClaw do the task, and I was there to guide it. Once it was done, I said: skillify it.


So it turned into a tarball that anyone could reuse indefinitely and apply to any hackathon form.


Now, I find myself saying "skillify" almost every day. I have over 350 skill packs. Almost every task I need to deal with personally and professionally can now be handled by my Agent.


This is an example of inversion.


In the past, such a capability would have been a full-fledged software project: requiring a crawler, scoring pipeline, video processing, research modules, and a ranking system. Now, it has turned into Markdown plus a bit of code, built by the Agent in an afternoon, and everyone can reuse it.


By the way, the winner of that hackathon did indeed write a piece of code that I polished and merged into the main branch. Now, GStack can test iOS apps on simulators and real devices, and this entire functionality was built by one person in less than 8 hours during the hackathon.


Tokenmaxxing


Here is your ticket, but almost no one wants to pay in cash: you must be willing to spend on tokens.


Peter Steinberger did OpenClaw, my favorite harness. He mentioned that he is willing to spend around $1 million annually on tokens.


Most people would recoil at that number. But they shouldn't because the gold is right here: if you are willing to do this, you can live in 2028. While others would take several years to catch up.


This is also why OpenAI has decided to offer each YC company a $2 million token credit limit in the form of an uncapped SAFE.


There is something magical that happens when you can transform raw intelligence into a token, and then convert that token into an output that can actually be used by users to address real needs, and they are willing to pay for it.


If you are a founder, you should maximize this ability. This is also why I have always emphasized skillification, as it is a truly effective way to achieve positive results.


For an era, we always thought LLM invocations were too expensive and had to be used sparingly. We have been rationing them.


But now, this instinct is what is holding people back.


If you are willing to token-max, willing to let the Agent freely consume tokens and run continuously, you can gain a first-mover advantage similar to the early days of the internet in 1994, except this time the cost is paid in tokens.


This will keep over 99.99% of organizations that are still haggling over a price that is collapsing out of the door, and give the leading edge to the few who truly understand.


From tens of thousands to hundreds of thousands of dollars a year, and even less for some, you can operate today in a way that the whole world will have to adopt in a few years.


You can live in 2026 as if it were 2028. The upfront investment is worth it. Because a token worth $100,000 today may only cost $10,000 next year, $1,000 the following year, and perhaps only $100 by the end of 2028.


If you were to tell any entrepreneur in history that you could invest a six-figure sum of capital to place yourself two to three years ahead in the future and maintain that advantage for several years, all 100 out of 100 eligible founders would take that deal.


The only thing standing in the way is that instinct from 2013: it tells you that model invocations are too expensive and cannot be used liberally.


But they are no longer expensive. That's old economics. The reversal has already occurred.


Esalen, Not Foxconn


If 540,000 lines of control code were written to build a Foxconn factory for workers, then the solution is to build its opposite.


At the edge of a cliff in Big Sur lies a place called Esalen. People go there to be dismantled, reshaped, to drop their armor, and then return more like themselves.


No assembly line. No foreman. No 6 a.m. whistle. It's freedom, not control.


Go build something like that.


Build a place like YC: where we help you build your company, solve real problems, find product-market fit.


Build places that make workers free, whether those workers are human or AI.


That's the whole core spirit.


Do things that make the Agent free. Build companies that enable humans to be free.


In knowledge work, the factory is a failed model. The real goal is to build liberating institutions. Now, that goal is also pointed at the Agent.


OpenClaw is like a Ferrari you have to bring your own wrenches for. The model is the engine, not the whole car. We're still in the Apple I moment, still soldering breadboards.


It's rough around the edges, the release. You still have to finish it yourself.


My open-sourced GBrain, retrieval engine, and skill pack are not turnkey products.


Some say OpenClaw is unsafe. They don't understand that freedom is what makes it powerful. Don't rush to put safety rails on something you trust before you've truly hit a problem. The wrench in your hand is precisely the sign it hasn't been locked up.


The reason a control system is refined is because control demands total control, like a Foxconn factory. The reason a freedom system is rough is because it believes you'll finish it.


You have to choose which one you're building. Then look back at how much code you've written.


What this really means


540,000 lines of Rails code is my way of proving I can still hit the high score in an old game.


But that high score belongs to Web 2.0, to ten years ago.


I can still play the old game well, even become a 1000x engineer. But what I'm doing is building a Foxconn factory. Old code. An old game.


And new games, they're just not played in lines of code.


The upshot is my haters were right. To all you anonymous friends out there reading this article, I salute you.


When you can turn intent directly into a system that's executable, testable, and reusable, the bottleneck isn't how much you can build, but rather what you want and whether it's worth building.


Scarce resources turn into clarity, taste, and judgment.


The engineer who writes the least code often builds the most.


It took me 540,000 lines of code to learn this. You don't have to.


[Original Article Link]



Welcome to join the official BlockBeats community:

Telegram Subscription Group: https://t.me/theblockbeats

Telegram Discussion Group: https://t.me/BlockBeats_App

Official Twitter Account: https://twitter.com/BlockBeatsAsia