Editor's Note: Against the backdrop of continuous leaps in AI model capabilities and the large-scale adoption of tools such as Claude Code and Cursor by enterprises, industry discussions are shifting from "how powerful the model is" to "how the model enters production." However, as AI programming, automated analysis, and data modeling gradually become the new consensus, a more fundamental question is emerging: as execution costs are rapidly reduced, what is truly scarce—human resources, capital, or the right to use cutting-edge models and tokens?

This article is based on a conversation between Patrick O'Shaughnessy and SemiAnalysis founder Dylan Patel. Dylan, who has long focused on AI infrastructure, the semiconductor supply chain, and model economics, discussed how AI is changing enterprise organization, information services, token demand, computing power supply chains, and societal sentiment, starting from the skyrocketing expenditures of his company, Claude Code.
What is most noteworthy about this conversation is not that a particular model once again refreshed a benchmark, but that it provides a way to understand the AI economy—as a production system that is reallocating execution capabilities, organizational efficiency, and industry profits, rather than just a software tool upgrade.
This conversation can be understood from five main angles.
First, execution costs have been pierced. In the past, ideas were not scarce; the real challenge was turning ideas into products, systems, and deliverable services. Now, with Claude Code, non-technical individuals can write code, build applications, and perform data analysis, tasks that previously required a team to maintain long-term, are now being completed by a few people with the help of models. SemiAnalysis's annual expenditure on Claude Code has reached $7 million, exceeding a quarter of its payroll expenses, indicating that AI is no longer just a productivity tool but is becoming a new form of capital for enterprises.
Secondly, the information services industry was the first to be rewritten. Dylan's business essentially sells analysis, consulting, and datasets, which is the area most easily commodified by AI. Tasks such as chip reverse engineering, energy grid modeling, and macroeconomic indicator development, which previously required a team's long-term investment, can now be carried out by a few individuals in a matter of weeks to produce a usable product. This means that the pressure AI poses to information services companies is not "whether it will replace people" but "who can quickly redesign a competitor's product." Companies that do not adopt AI will be commodified faster by other companies, and those that do use AI must also continuously raise their standards to avoid being replaced by the next wave of more efficient competitors.
Furthermore, tokens are evolving into new means of production. In the past, when an enterprise purchased software subscriptions, the core issue was whether the tool was user-friendly; now, access to cutting-edge models, rate limits, enterprise contracts, and token budgets are starting to directly determine production capacity. A more powerful model does not necessarily mean higher costs, as a smarter token may be able to complete higher-value tasks with fewer steps. True competition is shifting from "who is using AI" to "who can access the strongest model and deploy the most expensive token in the highest-value scenarios."
This demand will continue to cascade down the entire supply chain. The surge in token usage will ultimately lead to sustained pressure on GPU, CPU, memory, FPGA, PCB, copper foil, semiconductor equipment, and wafer fab capital expenditures. The "bullwhip effect" mentioned in the article is precisely this logic: downstream increases in seemingly just model call demands could translate upstream into orders, capacity expansions, and price hikes that are magnitudes larger. As a result, the profit distribution in the AI industry will not only be confined to model companies and NVIDIA but will continue to spill over along the semiconductor and data center supply chain.
Lastly, the social backlash against AI may come sooner than expected. As AI truly becomes integrated into workflows, public concerns about job displacement, energy consumption, data center expansion, and power concentration will also rise simultaneously. Dylan even predicts that large-scale protests against AI could emerge within three months. For model companies, persistently emphasizing that "AI will change the world" may not necessarily alleviate anxiety but could instead reinforce the public's imagination of a lack of control. What the AI industry needs to demonstrate next is not just its technical capabilities but how it can create specific, tangible public value in the present.
Today, the core issue with AI is shifting from "what models can do" to "who can access models, how to use models, and who can capture the value created by models." In this sense, the subject of this article is no longer just Claude Code, Anthropic, or a single AI company but a structural reorganization unfolding around productivity, capital expenditure, organizational efficiency, and societal acceptance.
The following is the original content (slightly reorganized for better readability):
TL; DR
· The core variable of AI is shifting from "can we do it" to "is it worth doing." After a steep drop in execution costs, the real scarcity is high-value ideas that can be scaled by models.
· Claude Code spending representing 25% of payroll costs is just the beginning; AI is transitioning from a software tool to an enterprise's new means of production.
· The competition for cutting-edge models is no longer just about capabilities but about token acquisition rights; whoever can access the most powerful models earlier and more consistently may establish new business moats.
· The information services industry will be the first to be reshaped by AI as the production costs of data, analysis, and research are rapidly decreasing, causing slow companies to be outpaced by faster commercialized entities.
· Token demand will not slow down due to price drops of older models, as with every model improvement, new high-value use cases are unlocked, propelling users towards more expensive cutting-edge models.
· The most significant change brought by AI is not to make people work less but to enable a few to achieve several times the output in the same time frame; those unable to create and capture token value will be stuck in a "permanent underclass."
· The scarcity of computational power is spreading throughout the entire semiconductor supply chain, from GPUs, CPUs, and memory to PCBs, copper, and equipment manufacturers, with AI demand becoming a pricing force across the whole industry chain.
· The economic value of AI is challenging to capture in traditional GDP measures; the real issue is not how much money model companies make but how much "phantom GDP" the decisions, efficiency, and cascading effects generated by tokens have actually created.
Original Interview:
Claude Code Has Become the New Workforce
Patrick O'Shaughnessy (Host):
You told me a fascinating story before about the significant shift in your team's token usage this year. Could you tell it again? How did it change your understanding of what's happening in the world?
Dylan Patel (Founder of SemiAnalysis):
Last year, we thought we were heavy users of AI. Everyone was using ChatGPT, everyone was using Claude, and I provided my team with all sorts of subscriptions they wanted. At that time, the company's spending in this area was around the tens of thousands of dollars.
However, this year, expenses began to soar. The real inflection point was probably at the end of last December, with the emergence of Opus. This also includes Doug, our CEO Douglas Lawler. He has been leading the charge to get non-technical people to code using AI. It can be said that he slowly brought the entire company along. Of course, the engineers were already using it, but starting in January of this year, our expenses took a clear upward turn and then quickly surged.
Later on, we signed an enterprise contract with Anthropic. The last time I talked to you, our annualized spending was about $5 million; now it has reached $7 million.
Patrick O'Shaughnessy:
And that was just last week's figure.
Dylan Patel:
Yes, a large part of that is the usage itself. What's really interesting is that people who have never coded before are now using Claude Code, and some can spend thousands of dollars in a day. But from the company's perspective, our annual spending on Claude Code has now reached $7 million, while our payroll spending is around $25 million. In other words, spending on Claude Code has exceeded 25% of the payroll spending.
If this trend continues, by the end of the year, it may even surpass the total payroll amount. It's a bit frightening. Luckily, I don't have to choose between "people" and "AI" right now because the company is growing rapidly. It's more like: I don't need to hire as quickly, but I can spend more on AI, and it does work effectively, helping the company grow faster.
But I think other companies will sooner or later start facing this issue: if one person using Claude Code can do the work of 5, 10, or even 15 people, what's next? First, it may indeed be necessary to lay off people; second, these usage scenarios are currently very diverse.
For example, we have a reverse engineering lab in Oregon that has been in operation for a year and a half. It is equipped with many high-end devices, such as microscopes and scanning electron microscopes. The core purpose of this lab is to reverse-engineer chips, extract chip architecture, and analyze the materials used in their manufacturing process. This is also one of the data sets we sell.
But analyzing this type of data used to be a very slow process. Now, we have someone on our team who, for just a few thousand dollars' worth of Claude tokens, created an app. This app can be GPU accelerated and runs on servers we have placed at CoreWeave. All you have to do is send it a picture of a chip, and it will automatically mark the position of each material on the image: here is copper, here is tantalum, here is germanium, here is cobalt. Then you can very quickly perform finite element analysis on the entire chip stack, and it is visualized, complete with a graphical interface and dashboard.
This person used to work at Intel, and he said that in the past, this used to be something a whole team would do and maintain. Now, when you look at similar things across the entire company, it's simply incredible.
Another example that I found particularly interesting is Malcolm. He used to be an economist at a major bank. The bank's economics department probably had 100 to 200 people. The work he is producing now is truly amazing.
He has brought in various data sources, including FRED data, employment reports, and other datasets from different APIs. We have also signed contracts with some data providers to get API access. Then he pulled in all the data, started running regressions, and analyzed how different economic changes affect inflation or deflation.
The U.S. Bureau of Labor Statistics has a whole set of task classifications, with around 2000 tasks. Malcolm used AI to evaluate which tasks can now be done by AI, and which cannot, and he scored them based on a rubric. The results showed that about 3% of tasks can now be completed using AI.
So he created a metric to measure which tasks can be done by AI, and when these tasks are done by AI, how much deflationary effect it would have. Output may increase, but because costs are decreasing so significantly, theoretically GDP might actually contract. He calls this "Phantom GDP."
Based on this concept, he conducted a comprehensive analysis and established a new language model benchmark, consisting of approximately 2000 evals.
Patrick O'Shaughnessy:
Is he doing all of this by himself?
Dylan Patel:
Yes, he's doing all of this by himself. He told me, "Bro, this would have been something a 200-person team of economists would take a year to accomplish in the past." He is now fully immersed in Claude, saying everything has changed.
Patrick O'Shaughnessy:
As a business operator, how do you make sense of this? You went from almost no such expenditure to now it's close to 25% of the payroll expenditure, and it's still rising. At what point do you think, "Wait a minute, should I hit the brakes? Should I rein in the spending? Maybe we don't always need to use the cutting-edge model that was just released today, like Opus 4.7, but could switch to a slightly cheaper model?"
Dylan Patel:
At the end of the day, what I do is the information business. We sell analytics, do consulting, and also create datasets. I can't see any reason why these things won't be fully commoditized at a fairly rapid pace.
If I don't keep improving, the first data product I sold, which now has more people starting to do similar things, would be the earliest I sold. The fact that we can still sell it is because we keep making it better, finer. But the way we did it in 2023 is actually not significantly different from what others are doing now. If I don't continue to raise the bar, I will be commoditized. If I don't act fast enough, I will also lose my advantage.
So the question is: yes, AI will commoditize a lot of things, just as it is commoditizing software. But those who act fast enough, master client relationships, consistently provide excellent service and continuously improve their services will not shrink, but rather grow faster. Those who are incompetent and do nothing will lose.
So this is actually a bit of a survival issue: if I don't adopt AI, others will, and then they will beat me.
Another very simple example is the energy sector. We've had several energy analysts for about a year, trying to build an energy model. The model is very complex, and the energy data services market is roughly a $9 billion market size, so it's obviously a huge market I'd like to get into. But despite having people on our team working on it for a year, we haven't actually really entered the energy data services business.
Then, the "Claude Code psychosis" kicked in. We had someone on our team responsible for data center energy and industrial operations, named Jeremy. After starting to use Claude Code, things suddenly changed. In three weeks, he spent a lot of money, probably around $6,000 a day, which was really exaggerated. But he grabbed every power plant in the U.S., every transmission line above a certain voltage level, and built a map of the entire U.S. power grid from various public data sources, all while accessing a lot of demand-side data.
We turned it into a dashboard that allows you to view and analyze power shortages and surpluses in various micro-areas of the U.S., as well as many details. This thing was set up in a matter of weeks.
Later, we showed it to some clients who had already purchased our data center dataset, including energy traders. After seeing it, they said, "Wow, how long did this take? This is great, even better than Company X." Upon further investigation, we found out that "Company X" has 100 people and has been working on this for ten years.
Of course, our current product is not as comprehensive or robust as theirs, but in some aspects, it is already better. So now I am commercializing these energy data services companies. But on the flip side, if I don't move faster, who will come to commercialize me?
So, from a business owner's perspective, the question is not "Did I spend a lot of money?" Yes, I did spend a lot of money. But the question is, what did this money bring me? Did it bring in more revenue? If the answer is yes, then that money is worth it.
Patrick O'Shaughnessy:
Are you worried that ultimately, those who control capital, those responsible for investing capital, the very people who often hire you for what you do will say, "We have our analysts too, and they are also smart, so why don't we just do it ourselves?" If this becomes so easy, at what point will it all flow back internally to the investment firms? After all, they are most likely to leverage the most from this data and insight.
Dylan Patel:
First of all, any information service business is fundamentally like this: the value I derive from a piece of information is clearly not as great as the value the customer derives from that information.
If I sell you information for $1, the reason you are willing to spend that $1 is because you know that this information can help you make a decision, and that decision can earn you more than $1. In other words, you have an arbitrage opportunity. The money you make from me is more than the money I make from selling that information.
Investment funds themselves, of course, have their own information service capabilities. Especially institutions like Jane Street, Citadel, these organizations are very detailed and deep in data. But these institutions still buy our data, and they continue to buy, and our collaboration is growing.
I think there is some kind of "it factor" here. We move faster, more flexibly, with a smaller team, and we focus on a very specific area: AI infrastructure and the massive changes it sparks, including AI, tokenomics, and the whole suite of related things. We can see the direction earlier and build things faster.
So, investment professionals will certainly try to do some of the things we do themselves. But most of the time, they will directly purchase our data and then build on that. For them, buying our data and building on top of it is usually cheaper than starting from scratch. Of course, eventually, someone will definitely try to do it themselves.
Token Transforming into New Means of Production
Patrick O'Shaughnessy:
I feel like every time I chat with you, I end up circling back to the same question: the supply and demand of tokens. It's currently the most fascinating thing in the world to me. Have your own experiences given you any new insights into the demand side? Has your perception of token demand shifted after experiencing it firsthand?
Dylan Patel:
If we take a step back and look at it from a macro perspective, Anthropic's ARR may have already grown from $9 billion to around $35-40 billion. By the time this episode airs, it may have even reached $40-45 billion.
However, their computational power growth has not matched the same magnitude. If you do the math, and assuming they haven't reduced their R&D computational power - which they clearly haven't because they are still releasing new models like Metis, Opus 4, Opus 4.7 - that tells you one thing: their added computational power, even if all directed towards inference, puts their gross margin bottom line at around 72%.
In reality, some of the added computational power likely also goes into R&D, so their actual gross margin may be even higher than 72%. It's worth noting that earlier this year, someone leaked part of their funding documents, indicating a gross margin of just over 30%.
How can a business increase its gross margin to this extent in such a short period? In principle, it's because the demand is too high. They can tighten their usage quotas, rate limits, and various restrictions. The most crucial thing is that you need an Anthropic account manager, you need enterprise contracts, and you need to be able to get the rate limit increases you need. Otherwise, the token will inevitably become extremely scarce.
Those who can afford it will get it. Anthropic faces the same challenge - which is not really a challenge but the reality of how capitalism works. Yes, customers may be paying them $40 billion a year in token fees, but the value these tokens create for customers far exceeds $40 billion.
Each token creates a different value for different companies. But as models become more intelligent, what really matters is: who can obtain these most intelligent tokens and use them for the most valuable purposes.
As an individual, what you have to decide is: how to use these tokens to grow your business, create value. Many people will want tokens and will consume tokens. But the average SaaS startup in San Francisco creating software products using Claude may not actually create significant value. Sooner or later, they will be priced out by the token.
Patrick O'Shaughnessy:
Today, while I was on a flight, I encountered this situation. As soon as Opus 4.7 was released, I wanted to use 4.7 immediately. However, when I tried to use it right away, I was rate-limited and couldn't use it at all. I couldn't even fathom going back to 4.6, even though I had been quite satisfied with 4.6 over the past few weeks; it was already powerful.
Are you surprised by how people are so determined to use the most expensive, cutting-edge models?
Dylan Patel:
Not surprised at all. One of the funniest memories I have from the past month and a half is when my friend Leopold and I almost knelt in front of one of the Anthropic co-founders, begging him to grant us access to Metis.
We knew it existed, so we were like, "Please, let us use it." And then he said, "I don't know what you are talking about."
Patrick O'Shaughnessy:
What was your reaction when the price list, or as you say, the eval card, came out?
Dylan Patel:
Actually, there were rumors even before it hit the Bay Area, and we had a rough idea that it was going to be very powerful. If you look at the benchmarks, of course, benchmarks keep evolving, but Mephisto/Metis is likely the most significant leap in model capability in the past two years.
I think this is crucial: it's so powerful that Anthropic doesn't even want to fully release it. Despite having disclosed the price to some clients and done selective releases, such as for cybersecurity-related scenarios. The token cost might be five to ten times higher, but they still don't want to unleash it entirely because of the potential real-world impact.
So what we have been given now is a lesser, weaker version, which is Opus 4.7. They explicitly mentioned in the model card that they intentionally made poorer optimizations for cybersecurity capabilities. I don't know if you read that part.
So what I'm saying is: whoever you are, as long as you have enough capital, you should go buy Anthropic's enterprise subscription, pay by token, rather than using those regular subscriptions. This way, you won't be as easily rate-limited.
Then you have to think: how to utilize these tokens on the most high-value tasks and make money from them. Because fundamentally, perhaps a year or two from now, many businesses are essentially engaging in token arbitrage. The token is robust, but the key is knowing where to direct them.
A few years down the line, the model itself may know how to use the token, how to create the most value.
If you look back at any benchmark, you will find: in the past, reaching a certain level of capability required cost X, now it may only require one hundredth of the original cost, or even one thousandth. For example, when DeepSeek reached the level of GPT-4 capability, the cost was approximately one-sixth of GPT-4. Since then, the cost of a GPT-4 level model has continued to decrease.
Of course, no one really cares about GPT-4 level models anymore. What everyone wants is cutting-edge models because cutting-edge models can create something truly economically valuable. However, GPT-4 level models can still be used in some scenarios, but those scenarios are usually smaller.
So what is truly driving demand is not that old capabilities have become cheaper, but that new use cases keep emerging. You are currently using a model at the level of Opus 4.6 or Opus 4.7. A year from now, if I were to attain the same quality model capability as today, my expenses may only be $70,000, perhaps 100 times cheaper.
But that's not important. By then, I will definitely be using a more powerful model to do more valuable things.
Anthropic's Metis, as a more expensive model itself, consumes far fewer tokens to accomplish the same task. So in most tasks, it is actually cheaper than Opus 4.6.
Dylan Patel:
Because it is much more efficient. Even though each token itself is "smarter" and more expensive, the number of tokens needed to complete a task is much less.
Patrick O'Shaughnessy:
The last time I saw you, Metis might have just been released, or the model card had just come out. You said at the time that it was so powerful that it scared you a bit. What did you mean by that?
Dylan Patel:
Anthropic's goal for 2025, even starting from 2024, is: by the end of 2025, they hope to have a software engineer at an L4 level within the model. Overall, they have basically achieved it with Opus 4.6.
But what they didn't say is that if you look at Metis and then compare benchmarks, it is more like an L6 engineer. L4 is probably a relatively junior software engineer, while L6 is already a pretty experienced engineer.
I remember Anthropic mentioning that this model was internally available around February. In other words, within two months, they went from L4 engineers to L6 engineers. What will happen next?
When you think about model improvement, you will find that it is actually accelerating. Anthropic's release cadence is shrinking, and so is OpenAI's. Why? Because generally, to build better models, you need a few things.
First, you need significant computational power. Computational power is very expensive and has its own time scale. We will keep track of these things, and it is indeed growing, but in the short term, it is mostly set. The computational power you have signed up for is essentially fixed. Of course, there may be delays and adjustments along the way, and there may be efforts to get a little more, but overall, it is quite stable.
Second, you need exceptionally talented researchers. Companies are now willing to pay tens of millions of dollars for these individuals.
Lastly, there is the implementation ability. Historically, implementation has always been very challenging. If I have an idea, I still have to implement it, and implementation is hard. But now, ideas are everywhere, and implementation has become very easy. It is expensive but straightforward.
So the question becomes: How does one decide which ideas to implement? The result is, when implementation becomes too easy, you can implement more ideas and run faster on this treadmill.
This can happen in AI model research, so the model release cadence has shortened from six months to two months. It can also happen in other fields. For example, I want to model every power plant and transmission line in the U.S., run regressions, and analyze the supply-demand dynamics at a microscale—I can now do that too.
Ideas themselves are cheap. The key is, which idea is meaningful? Which idea is worth your investment, to purchase tokens and bring it to life? Because the implementation ability is already there. That's the most critical change.
If the implementation cost continues to decrease—and it is indeed decreasing—we haven't even really gotten to Metis yet. Opus 4.7 was just released a few hours ago, but our team is already very excited.
What will this bring to the world next? I think it will reorder the way the economy operates.
In the past, execution was crucial because it was difficult; ideas were cheap. Now, not only are ideas cheap, they are also abundant, but execution has become very easy as well. Therefore, what is truly worth doing are only those ideas that are good enough—they can prove that even though implementation is already extremely affordable, it is still worth spending money on.
Patrick O'Shaughnessy:
So are you really afraid? Or is it more introducing a hard-to-grasp uncertainty?
Dylan Patel:
Uncertainty definitely exists. But I do feel a certain fear. The question is, how will society reorganize itself?
When you live in a world where the ability to accomplish something is no longer as important, what truly matters? What matters is whether you can come up with the right idea for AI to execute; whether you can sell that idea, or sell what AI has produced; whether you can raise capital for this direction. That's what becomes important.
This also brings us back to the earlier question: It's crucial to always have the latest model. So who can access the latest model?
Anthropic has a project, I know it's not called Earwig, but I like to humorously call it Earwig to tease people at Anthropic. They only provide Metis to certain companies for cybersecurity scenarios. I believe this kind of thing will continue to happen: the deployment scope of models will become narrower and less consumer-facing.
I know OpenAI, Anthropic, and other companies all say they want everyone to have powerful AI. But AI is very expensive. Who will foot the bill for the trillions of dollars in infrastructure? It will be those who are wealthy and can build useful things with AI.
Plus, you don't want others to distill your model, so you won't release it on a large scale. You will provide it to an increasingly smaller group of clients. And then, these clients will also start competing for tokens.
Unless Anthropic significantly raises prices. They could easily double the price of Opus, and I would still pay. I dare say most users would continue to pay. But I think this still doesn't address their massive production capacity issue.
So the question becomes: Where will this cycle end? What will happen when the usage of tokens, and the additional value these tokens bring, become increasingly concentrated in the hands of a few companies?
I don't have Metis right now. But who does? The top banks do. Right now, they might just be using it for cybersecurity, but I can imagine a world where, because I have an enterprise contract with Anthropic and because the folks at Anthropic like me enough, they might be willing to grant us slightly earlier access or a slightly higher rate limit. I certainly hope that happens.
Then my competitors don't have that access, and I can beat them.
It could also play out differently. For example, someone like Ken Griffin from Citadel, with strong connections and a lot of money, might go and sign a deal with OpenAI or Anthropic, saying, "I'll buy $100 billion worth of tokens first every year. Whenever you release a new model, I'll buy the first $100 billion worth of tokens, and then others can use them."
What would happen then? He could potentially dominate the market.
That's just one example. It could also happen in the field of cybersecurity, where Anthropic is concerned that the model might make it easier for people to hack into systems. It could also happen in information services like mine, where I use it to crush others.
I think the impact of this is very broad. We don't know what these models can really do. Anthropic doesn't know, OpenAI doesn't know, no one knows. Ultimately, it will be up to end users to discover: where can these tokens actually be used? What can they build? What can they imagine?
This will certainly greatly increase productivity, which is also very positive for humanity. But the question is, how will resources and usage rights be concentrated?
Robots Will Drive the Next Wave of Demand
Patrick O'Shaughnessy:
Right now, the tokens consumed by robots or robotics are almost negligible compared to other areas. What are your thoughts? Will it become the next demand curve? Within a mile of here, there are new robotics startups emerging every day, trying to do something interesting.
Dylan Patel:
There's a concept called "software-only singularity." It means that the world might see an AI singularity that occurs only in software first. But the problem is, most of the world is still physical. You'll see that the world will eventually organize around hardware, not just software. So, I think the so-called "software singularity" will only be a transient phase, not the endgame. Because ultimately, we will still enter the physical world.
Once the software becomes very easy, the truly challenging part for robots is what? It is the programming, microcontrollers, actuators, and controlling all of these things. These are all very difficult now.
AI models have an interesting feature: their learning efficiency is actually very low. It is only because we have given them massive amounts of data that they have learned things and in some aspects surpassed humans.
But the current models for robots, such as VLA, which stands for Vision-Language-Action, are very popular now, but I think it may not be something that can ultimately continue to scale. Their data efficiency is low, and we cannot expand the scale of robot data fast enough.
In the future, there will definitely be a way to pre-train robot models on a large scale. Just like humans constantly see various data throughout their lives. The real strength of humans is that we are very "sample-efficient." With one example, two examples, we can learn.
If this ability is applied to robots, the situation will be completely different. Once there is a software singularity, implementation becomes very cheap, and anyone can start building these models. Next, people can start building truly useful robots.
So I believe that in the next 6 to 18 months, we will start to see real breakthroughs in the field of robotics. The key ability is few-shot learning. By then, there will be a pre-trained robot model, and then you hire or purchase a robot, show it a few examples, and it can complete the task.
You tell it to stack these two things together, and it can do it. You tell it, "This thing can actually maintain balance." It will start trying and succeed. Trust me, I have spilled things over many times myself.
So I believe that robots will have the ability for few-shot learning.
There are indeed many companies working on robots now, some for advertising displays, some for very simple tasks. But next, it will become very specialized. For example, a robot specifically for folding clothes, or even more specialized, a robot specifically for erasing blackboards. It might be a rental service, or it might be a model package that you download onto a standard robot, and it can perform this task, and then you pay per use.
Anyway, the physical goods sector is going to experience a huge acceleration, creating a deflationary effect. This will ultimately continue to drive insane growth in token demand. So personally, I don't think token demand will slow down.
Patrick O'Shaughnessy:
From the results of Metis and the way it was built, did you learn anything new about the world? In other words, if we break down the various components of scaling laws, such as the pre-training aspect...
Dylan Patel:
It is a model much larger than previous models. 100,000 Blackwell is equivalent to hundreds of thousands of chips from the previous generation. Of course, TPU and Triton have their own different release cadences, so it's not a perfect match. But ultimately, yes, Metis is a significantly larger model. It proves that scaling laws are still effective. Everything it has shown indicates that the trend continues: put more computing power into the model, and the model will improve.
Moreover, throughout the entire process, it's not just "more computing power makes the model better." At the same time, we are constantly gaining improvements in computational efficiency. All the research and development computing power put into the lab eventually translates into one thing: the cost of achieving a certain level of capability in a model will drop significantly every six months, or now every two months. But if I scale up massively, I can still get a huge leap in capability.
So, yes, it proves that this is still happening. Google and Anthropic are not heavy users of GPUs on the training side. OpenAI should also be launching a new generation model next. I think they are taking a more rational, principle-based incremental approach to scaling. Whereas Anthropic has taken a huge leap this time.
This year, we will see better and better models, and the pace of releases will only accelerate.
Patrick O'Shaughnessy:
We've been talking for a long time in this conversation, but we've hardly mentioned OpenAI. This would have been very strange in the past.
Dylan Patel:
That's the interesting part. Now, many people would say: So Anthropic has already won, right? They had Metis in February, but they didn't even release it because they felt it wasn't necessary. Their computing power has sold out, and their revenue is increasing by $10 billion every month. And today they released Opus 4.7, and all of this is happening before the rumored Spud release from OpenAI—media like The Information have reported on this rumor.
So at first glance, it seems that Anthropic is clearly ahead, and OpenAI appears to be in trouble. However, what's interesting is that Anthropic is very clearly constrained by compute power, with limited ability to scale. Dario had previously boasted that OpenAI was too aggressive in its compute investment, while Anthropic's scaling was more rational. But now Anthropic might be thinking: maybe we should have had more compute power from the start.
OpenAI, on the other hand, is fully capable of footing these bills. In fact, they have raised a lot of money to acquire more incremental compute power. Additionally, they have made very aggressive, even somewhat "irresponsible," scale-ups in the past, purchasing compute power from companies like Oracle, CoreWeave, SoftBank, and Microsoft. Now they have also obtained Trainium from Amazon.
So OpenAI has done a very crazy thing in terms of compute power, and they know they need even more.
What's interesting is that if we look at Opus 4.6, let's temporarily set aside the model getting stronger and just focus on the diffusion of this technology. You and I might start using the model on the first day it's released, but other enterprises need time. People also need time to learn. The "Claude awakening moment" won't hit everyone at the same time. So by the end of the year, assuming a model at the Opus 4.6 level, if the entire economy is willing to spend $100 billion a year on it, I don't think that's an exaggeration. After all, we are already spending $400 billion now.
Patrick O'Shaughnessy:
This is essentially just linear extrapolation.
Dylan Patel:
Yes, this is linear extrapolation, not exponential. To achieve exponential growth, you need better models. But Anthropic won't have enough compute power to meet those demands. So, assuming OpenAI or Google quickly also reach this level of capability, whoever does it next can.
Anthropic may be able to charge a 70% gross margin, but if OpenAI is next to reach the same level of capability, even if it only takes a 50% gross margin, it will capture all of this incremental demand. And it probably doesn't have enough compute power to serve all users either. So, perhaps a model like Metis, if there is enough compute power worldwide, could bring in $500 billion in revenue, or even more sensational. The market demand for these tokens is too strong, and the supply of compute power is extremely limited.
We have already seen this in the skyrocketing prices of GPUs, such as the H100. The lifespan of GPUs continues to extend. It is evident that even second-tier labs are selling out of their tokens, not to mention top-tier labs. Top-tier labs will have better profit margins, but second-tier labs are also selling out, and even third-tier labs may be close to selling out.
The economic value that the strongest models can create is growing faster than the ability of the infrastructure to provide these tokens to people. Therefore, this gap will continue to widen. The profit margins of model labs will also continue to rise until those in the hardware supply chain and infrastructure supply chain react: wait, why don't I just increase my own profit margins?
Patrick O'Shaughnessy:
So it can be said that your judgment on the demand side today, especially your own example at SemiAnalysis, is completely explosive. And more broadly, as people enter what you called an 'AI psychosis' state, feeling what they can do, experiencing that the difficulty of achievement has almost completely disappeared, I have had this experience myself. In just a few weeks, my own token spending has skyrocketed.
This sounds like a pretty good demand-side judgment. So what else are we missing on the demand side? If you don't use more tokens, you will never get rid of the 'permanent subfloor.' Can you elaborate on this statement?
In other words, either you use more tokens and create excess economic value through these tokens, but many people's current usage is boring and lazy. They might think, 'Then I will only work one hour a day in the future, not eight hours, and let AI do most of the work for me.'
Dylan Patel:
That's a boring way. A cooler way is: I still work eight hours a day, but I get eight times the amount of work done, maybe make five times the money. Perhaps you may not necessarily make five times, but the direction should be like that.
Of course, if you only have one job, it's hard to do that. Indeed, some people will work multiple jobs at once, and some will start companies, start selling things. Before everyone is using AI, and it becomes an industry standard, you need to seize the economic value that AI brings. Because it is not yet fully mainstream. If you don't use more tokens, don't create value from these tokens, and capture that value, you cannot get rid of the permanent subfloor.
There are actually three different issues here: first, using more tokens; second, creating value from these tokens; and third, capturing value from the value you create with tokens. If you can't accomplish these three things, as model capabilities continue to soar, resources may further concentrate, and you may never escape the permanent subfloor.
Let's talk about the supply side. What exactly is happening now? If the demand curve is experiencing an explosive surge, then what is happening at the forefront of the entire supply chain to accommodate all these tokens? With the skyrocketing demand, everything on the supply side is seeing price hikes. Whether it's NVIDIA GPUs or other components, prices are on the rise. At the same time, their lifespan is also being extended.
This is precisely the price trend of the H100. In the past, some believed that the effective lifespan of GPUs was less than five years, which is completely untrue. Now, some Hopper clusters from three to four years ago are renewing contracts for another three or four years; some A100 clusters are also signing contracts for the coming years.
So, the effective lifespan of GPUs is obviously not five years, perhaps even seven or eight years. We don't know for sure; we'll have to wait until Hopper actually reaches that stage to see. But it's clear that it's not five years. Moreover, prices are rising during the renewal.
This means that the gross margin of a cluster is actually not 35%, but higher. The profit margin in the cloud sector is expanding. The profit margin at the hardware level is also very healthy, with NVIDIA still maintaining around a 75% gross margin. Looking further downstream in the supply chain, the profit margin in the memory segment has clearly increased significantly. Areas such as optical modules and logic chips are also seeing substantial prepayments, with profit margins slowly rising.
More importantly, companies like NVIDIA that manufacture chips are making huge prepayments. So even if gross margins have not significantly increased, the cost of capital, cash flow timing, or return on invested capital are also on the rise.
You can see this throughout the entire supply chain. ASML has sold out completely and now needs Carl Zeiss to ramp up production faster. Along the supply chain, each link is either sold out, leading to margin increases, or receiving prepayments, thereby improving the return on invested capital because the actual amount of capital needed is lower.
This is a consistent trend throughout the entire supply chain, even down to PCBs. Manufacturing PCBs requires copper foil, and even copper foil has sold out, leading to people making prepayments for it.
One could say that as long as something has a "pulse," as long as it is in the supply chain and is sold out, people will rush to secure more incremental supply and advance in securing the supply for the coming years.
Compute Power Shortage Transmits Throughout the Entire Industry Chain
Dylan Patel:
The supply chain usually reacts quickly. But this time, there is a unique aspect: today's supply chain is more complex than ever before, and what we are building is also more complex than ever before, so the delivery cycle is longer. It's not that other industries haven't seen 18-month delivery cycles before, but this time, the construction of additional supply itself takes several years.
That's how memory works. Memory capacity can only grow at a low double-digit percentage each year, around 20% to 30%. NAND grows even slower, while DRAM grows slightly faster. Even though the demand signal is very strong by the end of 2025, memory companies respond immediately, but real additional capacity will not arrive right away.
In addition to the usual annual 20% to 30% growth, they can of course squeeze a bit more capacity. But true incremental supply will not appear until 2028. It could possibly be by the end of 2027, but most likely in 2028. This is very unique. Even if they wanted to ramp up production at the fastest rate, supply will not come immediately.
The result is that memory prices have already skyrocketed. And let me tell you, especially DRAM, prices will at least double, triple, or even quadruple.
Some may say, "The memory story has been told over and over, everyone understands." But actually, no, you haven't truly understood. DRAM could still double or triple from now on because the required capacity is so significant. They have to snatch capacity from elsewhere. And in a capitalist economy, the only way to snatch capacity from elsewhere is by destroying some demand through higher prices. We are not operating under a rationing system, so this will definitely happen in the end. Profit margins will continue to rise.
I believe logic chips also face a significant capacity issue. TSMC just released its financial report, and they have been continuously raising capital expenditure. However, building fabs takes quite a long time. They are doing everything they can to squeeze out more output from every existing fab. But TSMC hasn't sharply increased prices because they are the "good guys." Their price hikes are probably only single digits, not triple-digit increases like memory manufacturers.
So in the end, you will see a market like this: TSMC is a great company, but will it really extract all the value? Maybe not.
I just mentioned some things earlier, like the copper foil, glass fiber, and lasers needed for PCBs. These are relatively well-understood but highly specialized supply chains, and they are currently under tremendous strain. Looking further upstream, at the semiconductor wafer fabrication equipment supply chain, I still believe that even though it has already appreciated significantly, the market severely underestimates its importance.
TSMC's capital expenditure guidance for this year is $56 billion. Our forecast from January was $57.4 billion, and it may be further increased slightly as we see additional ways for them to raise capex.
But what people are not really paying attention to is: what does this mean for next year? And the year after that?
As a result, three years later, TSMC could ramp up its capital spending to $100 billion. Perhaps in two years, by 2028, they could actually spend $100 billion on capital expenditures. I'm serious, TSMC could spend $100 billion on capex in 2028.
Many people cannot fathom this number. But what does this mean for its downstream supply chain? What does it mean for companies like Lam Research, Applied Materials, and ASML? What does it mean for further downstream supply chain companies like MKS Instruments?
The bullwhip effect will be further magnified.
If TSMC truly aims to spend $100 billion on capex in 2028, and I believe this is indeed possible, many will find it insane, but it could actually happen.
Patrick O'Shaughnessy:
What about other parts of the chip ecosystem? GPUs have always been dominant. But will CPUs, ASICs, or something else emerge as new opportunities and bottlenecks? Apart from NVIDIA's GPU dominance, what other areas are worth paying attention to?
Dylan Patel:
Yes, ASICs are clearly taking off. But I want to veer off slightly from AI chips themselves and talk about other things. We did a project on FPGAs and found that each next-gen AI rig will require approximately 120 FPGAs. What does this mean for all FPGA companies?
The same goes for CPUs. All these reinforcement learning environments, along with all that "garbage code" you and I generate—now running on some Vercel instance, some AWS instance, or some cloud resource we casually spin up. All of these require CPUs. So CPUs are now completely sold out and demand is rapidly rising.
Patrick O'Shaughnessy:
Help us understand, what role does the CPU play in the entire system?
Dylan Patel:
There are mainly two reasons explaining why you need a large amount of CPU.
First is reinforcement learning. When doing reinforcement learning, CPU is crucial.
In the past, you would throw the entire internet's data into a model to train it, and then the model would spit out some results. Now, you still put internet data into the model, but then you also put the model into an environment and say to it, "Here, try this out." The model will try many different things. Finally, this environment will evaluate whether the results of its attempts were successful and give it a score. These environments can be anything. It can be simple, like checking if the output text conforms to the correct format, or if the structured output is correct. It can also be very complex.
People have now started to venture into very complex scenarios. For example: "I want you to open this file, modify it, edit it, update it, and then submit it to a website." Or: "I want you to open Siemens' physical simulation software, edit this CAD model." So, these environments are becoming increasingly complex. And these environments run on CPUs, not on GPUs or ASICs.
ASIC or GPU are responsible for running the model itself: receiving input data from the environment, feeding it into the model, generating different output paths, meaning different ways the model thinks can solve the problem. Then, these paths are evaluated and scored. The successful paths are taken to continue training the model, updating the model, and iterating further. So, this is the first place where CPU is very useful.
The second place is deployment.
When you have these very strong models and deploy them, the models will generate code, generate various useful outputs. But these outputs do not go directly from the GPU into a human brain. They come out from the GPU or ASIC, enter some application you've deployed, and that application itself usually runs on a CPU.
So, this is another area with a very high demand. CPUs have largely been sold out.
The Intrinsic Difficulty of GDP in Capturing AI's Value
Patrick O'Shaughnessy:
As you continue to assess supply and demand trends and try to become the world's foremost expert on these two things, what are some things you wish you knew but don't know yet?
Dylan Patel:
I think for us, and for everyone, the most challenging part to understand is tokenomics, which is the economics of tokens. We have a very good sense of how much it costs to run infrastructure, how much the token costs, how much the model costs, what the profit margins of these labs are. But what's really hard to model is the usage and adoption rates.
In January, we made some very aggressive predictions for February, and Anthropic easily exceeded them. So how do we calibrate this model? What data source should we use? By February, we made very aggressive assumptions for March, and once again, they were exceeded. When everyone saw the "additional $10 billion in revenue" figure, the reaction was: What's going on? How did they actually achieve an additional $10 billion in revenue? Who is using these tokens? Why are they using them? What exactly are they building with these tokens? More importantly, how is what they have built with these tokens diffusing into the economy? How much value has it created?
This is not something that can be easily captured by GDP statistics. For example, all the value I create using tokens ultimately translates into better information. I then sell this information, and compared to what others have sold information for in the past, I sell it at a lower price.
This information then enters the entire economic system, enabling people to make better investment decisions or better competitive decisions. For example, if they are semiconductor companies, data center companies, or hyperscalers, what is the value of this information? What impact does it have on the economy?
From any subjective standpoint, this is obviously very impressive. But the question is, where is the "Phantom GDP"? What exactly is Phantom GDP? How do we track the real economic value?
Because the existing GDP metrics are not accurate. If you were to ask Dylan Patel how much GDP he has created, the number would be very small and disproportionate to the value I believe I have actually created.
So the ultimate question is: How much value have these tokens really created? Not just looking at direct revenue, but looking at the ripple effects they bring. What is the subsequent impact of everything they have done?
I think this is the real question, and it's the most difficult challenge to measure. I think we already have a very good understanding of the supply side. We also have a very good understanding of many signals on the demand side. But what value these tokens have actually created is hard to quantify and measure. I hope we can have this kind of conversation every three months because the change is just too fast.
Anti-AI Protests, Likely to Erupt Within Three Months
Patrick O'Shaughnessy:
So what do you think will happen next? For example, when I come back to San Francisco to meet you in three months, what do you expect to see?
Dylan Patel:
Massive Anti-AI Protests.
Patrick O'Shaughnessy:
Protests against AI? Tell us more.
Dylan Patel:
People hate AI. AI is now more unpopular than ICE, more unpopular than politicians. I don't know how Pew conducted their study, but apparently AI is more unpopular than politicians.
With Anthropic gaining so much traction, it will start driving downstream business changes. People will become increasingly fearful of AI. They will start attributing more and more of their issues, as well as many longstanding global deep-seated issues, to AI.
These issues will come to the surface and then be attributed to AI. There will likely be some politicians or people on social media, influencers, who will start weaponizing AI to attack others.
If you go look at comments under some news articles. Sam Altman's house got Molotov cocktail-ed twice in two weeks, and people are cheering in the comments. This is just the beginning. So I think, within three months, we will see massive anti-AI protests.
Patrick O'Shaughnessy:
What is the counterforce to this sen
