Editor's Note: Against the backdrop of the continued hype around AI investment and industry narrative, the question of "whether there is a bubble" has become a central topic of market debate. On one hand, the extreme risk narrative continues to strengthen people's concerns about technological runaway; on the other hand, rapid capital expenditure and valuation levels have kept the "bubble theory" lingering. In the face of this divergence, market judgment shows significant uncertainty.
The author of this article, Ben Thompson, is the founder of the technology analysis platform Stratechery, and has long focused on the structural evolution of the technology industry and business models. On the occasion of NVIDIA GTC 2026, he revised his previous judgment on "whether AI is in a bubble": no longer seeing the current situation as a bubble, but understanding it as a round of structural growth driven by a technological paradigm shift.
This judgment is based on the observation of three key leaps of Large Language Models (LLMs). Since ChatGPT first demonstrated the capabilities of large language models to the market in 2022, LLMs have evolved from being "available but unreliable" to "having reasoning abilities," and then to "being able to independently perform tasks." Especially by the end of 2025, with the release of Anthropic Opus 4.5 and OpenAI GPT-5.2-Codex, agentic workloads began to move from concept to reality.
The key lies not in the model itself, but in the emergence of the "agent harness." The agent decouples users from the model, responsible for scheduling the model, calling tools, and validating results, transforming AI from a tool that requires continuous human intervention to an execution system that can be entrusted with tasks. This change not only improves reliability but also expands the application boundaries of AI.
Building on this paradigm shift, the author further points out that the expansion of AI demand is no longer determined by user scale, but more by the scheduling capacity of each user; at the same time, agentic workloads have a "winner-takes-all" feature, which will continue to drive up the demand for high-performance computing power and bring structural opportunities for chip manufacturers and cloud service providers.
In this framework, current large-scale capital expenditures are no longer just speculative bets on the future but are more likely a preemptive reflection of real demand. As AI transitions from an "assistance tool" to an "execution infrastructure," its economic impact may only just be beginning to show.
Original Text:
In the past, I was more inclined to the latter, and even thought that in some stages, a bubble might not be a bad thing.
But at this moment, standing in March 2026, at the opening of NVIDIA GTC, my judgment has changed: This might not be a bubble. (Ironically, this judgment itself may indeed be a signal of a bubble.)
LLM's Three Paradigm Shifts
Over the past few weeks, while discussing NVIDIA and Oracle's earnings reports, I have mentioned multiple times that LLM has undergone three key paradigm shifts.
Phase One: ChatGPT
The first inflection point was the release of ChatGPT in November 2022, which almost goes without saying. Although Transformer-based large language models had appeared as early as 2017 and their capabilities were continuously improving, they had been consistently underestimated. Even in October 2022, I still believed, even in an interview with Stratechery, that while the technology was impressive, it lacked productization and entrepreneurial momentum.
However, everything changed a few weeks later. ChatGPT made the world truly aware of LLM's capabilities for the first time.
However, the early versions also left two profound impressions, especially reiterated by the "bubble theorists":
First, the model often made mistakes and would even "hallucinate" answers when it didn't know the answer. This made it more like a "showy tool," amazing but unreliable.
Second, despite this, it was still very useful, but you had to know how to use it, constantly validate outputs, and correct errors.
Phase Two: o1
The second inflection point was the release of the o1 model by OpenAI in September 2024. By then, LLM had significantly progressed due to stronger base models and post-training techniques, resulting in more accurate outputs and fewer hallucinations.
But the key breakthrough of o1 was: it would "think" before answering.
Traditional LLMs are path-dependent, once they veer off course in the reasoning process, they keep going in the wrong direction. This is a fundamental weakness of "auto-regressive models." In contrast, the inference model self-assesses answers; it generates answers first, then judges their accuracy, and if necessary, tries other paths.
This means that the model starts actively managing errors, reducing the user intervention burden. The results are also very significant. If ChatGPT's breakthrough was in "making LLMs usable," then o1's breakthrough was in "making LLMs reliable."
Phase Three: Agent (Opus 4.5 / Codex)
By the end of 2025, the third leap emerged.
In November 2025, Anthropic released Opus 4.5, initially met with lukewarm reception. However, by December, the Claude Code running on this model suddenly exhibited unprecedented capabilities; almost simultaneously, OpenAI released GPT-5.2-Codex, showcasing a similar level of performance.
People had been talking about "Agents" all along, but at this moment, they finally began to truly complete tasks, even complex ones that took hours, and do so correctly.
The key lies not in the model itself, but in the control layer (harness), which schedules the model, calls tools, executes processes, and validates results. In other words, users no longer interact directly with the model but instead provide objectives for the Agent to schedule the model, call tools, execute processes, and validate outcomes.
Using programming as an example:
· Phase One: Model generates code
· Phase Two: Model reasons through the generation process
· Phase Three: Agent generates code → Performs testing → Automatically runs tests → Retries if incorrect, with minimal ongoing user intervention.
This means that the core limitations of the ChatGPT era are being systematically addressed, leading to higher accuracy, stronger reasoning abilities, and automatic validation mechanisms.
The only remaining question is: What should it be used for?
The Lowering Threshold of "Proactiveness"
The reason I emphasize these three inflection points repeatedly is to illustrate why the entire industry is facing a severe compute shortage and why massive-scale capital expenditure is justified.
The three paradigms have vastly different compute requirements:
· Phase One: Training intensive but low inference costs
· Phase Two: Soaring inference costs (more tokens + higher usage frequency)
· Phase Three (Agent): Multiple calls to inference models, Agent itself consuming compute (potentially CPU-heavy), further explosion in usage frequency
But more importantly, the third point: the shift in demand structure is severely underestimated.
Currently, far more people use chatbots than Agents, and many actually underutilize AI. This is because using AI requires "proactiveness." LLM is a tool; it has no objectives, no will, and can only be invoked proactively.
However, the Agent changed that by reducing the requirement for human agency. In the future, one person can command multiple Agents simultaneously.
This means that even if only a few individuals possess "agency," it is enough to drive significant computing power demand and economic output.
AI still requires a "human driver," but no longer needs "many humans."
Enterprise Payment Driver
The consumer side's willingness to pay for AI is limited, and this has become increasingly clear. The true payers for productivity are enterprises.
What excites enterprises the most is not just AI improving efficiency, but AI's ability to replace labor and do so more efficiently.
The current reality is that within large corporations, those truly driving the business forward are often a few; yet the organizations are large, leading to significant coordination costs. The role of the Agent is to amplify the influence of the "value-driving individuals" while reducing organizational friction.
The result is "fewer people → higher output → lower costs." This is also why future layoffs may not only be "cyclical adjustments" but rather structural changes.
Companies will rethink not only whether they "hired too many people during the pandemic" but also whether in the AI era, we simply do not need as many people?
Why Is This Not a Bubble?
From this perspective, the logic of "not being a bubble" becomes clearer:
1. The core flaws of LLM are being continuously addressed by computing power and architecture
2. The number of people required to drive demand is decreasing
3. The benefits brought by the Agent are not just cost reduction but also revenue increase
Therefore, it is not difficult to understand why all cloud providers are saying that computing power is in short supply and are consistently increasing capital expenditures.
Agent and Value Chain Restructuring
Another key question is, if the model eventually becomes a commodity, can OpenAI and Anthropic still make money?
The traditional view is that they cannot, but the Agent changed that. The key is that the real value is not in the model itself but in the integration of the "model + control system."
Profits often flow to the "integration layer," rather than the replaceable modules. Just like Apple, whose hardware is not commoditized because of its deep integration with software. Similarly, the Agent requires deep synergy between the model and harness, making OpenAI and Anthropic key integrators in the value chain rather than replaceable parts.
Microsoft's transition is a signal; it originally emphasized "replaceable models" but had to abandon that after launching a true Agent product.
This means that models may not necessarily be fully productized, as Agents require integrated capabilities.
The Final Paradox
I must return to the paradox at the beginning.
I have always believed that as long as people are still worried about a bubble, it is not a bubble; a true bubble is when no one questions it anymore.
And now, my conclusion is: this is not a bubble.
But if the very act of me saying "this is not a bubble" proves it is a bubble, then so be it.
