Editor's Note:
Against the backdrop of the continued evolution of generative AI and the entry of Agents into actual production processes, the focus of industry discussions is shifting from "how powerful the model is" to "how the system can support intelligence." As large-scale model training gradually becomes standardized, a more foundational issue is emerging: what supports the continuous expansion of AI is no longer just algorithmic breakthroughs, but the entire computing system itself.

This article is a translation of a conversation between Jensen Huang and Lex Fridman. Lex Fridman is a renowned AI researcher and tech podcast host whose show has long focused on in-depth discussions of technology, industry, and future trends. In this conversation, Jensen Huang did not focus on model capabilities per se, but instead, starting from computing architecture and industry evolution, put forth a more structural judgment: AI is transitioning from a "chip problem" to a "systems engineering problem."
This conversation can be roughly understood from five aspects.
Computing Transitioning from "Chip" to "Factory"
The first core judgment of the conversation is that AI's competition is no longer focused on single-point performance but has evolved into a competition of system capabilities. From GPUs, to entire machines, to data center-level "AI factories," the boundaries of computing units continue to expand. At the same time, the role of computers has also changed—from a "warehouse" for storing and retrieving information to a "production system" continuously generating tokens. This means that AI is no longer just a tool but is directly involved in the economic infrastructure of production.
Four Layers of Scaling: Why AI is Getting "Heavier"
In addition, there has been a structural change in AI's scaling path. Growth no longer relies on a single pre-training scaling but on the stacking of four types of scaling: pre-training, post-training, inference, and Agent, forming a cyclical system. The Agent generates data, which enters training, and the training feeds back to inference, which then supports more complex Agents. All paths ultimately converge on one variable—computing power. The most crucial change is that inference is becoming the core of computational consumption, and "thinking" itself has become the most expensive link.
The Bottleneck of AI Shifts from Algorithms to Energy
As scaling continues to compound, the issue has shifted from the model level to the infrastructure level. A direct observation presented in the discussion is that AI's long-term bottleneck is no longer data or algorithms, but rather the power and energy system. However, the real constraint is not just inadequate supply, but rather grid scheduling, data center architecture, and enterprises' path dependency on "high availability." This transforms AI's problem from a technical issue to a comprehensive problem of engineering, energy, and institutional arrangement.
The Essence of CUDA: Market Share Rather Than Technological Advantage
On the competitive front, the discussion also provides a key insight: NVIDIA's moat is not just technological leadership, but the market share and developer ecosystem established through CUDA. By embedding CUDA into GeForce and sacrificing short-term profits for scale, NVIDIA has actually built a "computing platform." When scale, ecosystem, and execution speed are combined, technology itself becomes a secondary variable. This means that the AI competition is shifting from model capability to platform and system capability.
Will AI Take Jobs? No, but It Will Change the Definition of Work
At the application level, the discussion also makes an important assessment: AI will not simply replace professions, but rather reshape the structure of work. Task-level automation will enhance overall efficiency, increasing the demand for professional skills. The core of work is no longer "task execution" but "problem definition, tool invocation, and collaborative problem-solving," where intelligence gradually becomes an accessible capability, while human differences are more reflected in judgment and organizational skills.
If this discussion provides a clear entry point, it lies in transforming AI from "competition in model capability" to a systemic problem: as computation becomes a production system, what will constrain it will no longer be just the technology itself, but rather energy, supply chains, and organizational methods. From this perspective, the issue is no longer about whether a certain technological path is superior, but rather the entire world is being reorganized around a core computing-based infrastructure.
The following is the original content (reorganized for clarity):
TL;DR
· AI has evolved from "faster chips" to a "computing factory," where the competition is no longer single-point performance but the synergistic efficiency of the entire system capability (computing power, network, power, software)
·The success of CUDA lies not in the technology, but in its ubiquity: NVIDIA sacrificed profit for scale, establishing an almost unassailable computing platform ecosystem
·AI's growth is no longer just about larger models; instead, it involves simultaneous scaling in pre-training, inference, Agents, and more, all ultimately converging on one variable: compute power
·Inference is becoming a core part of compute consumption; "thinking" is more costly than "training," and AI is shifting from offline models to continuous-running systems
·The real bottleneck for AI is not the algorithm but energy and infrastructure; power scheduling capacity will become the next critical constraint
·Computing is transitioning from an "information warehouse" to a "production factory"; tokens become tradable commodities, and AI infrastructure will directly participate in economic production
·AI will not simply replace work but elevate the capabilities of all professions; the future core competency will shift from "executing tasks" to "defining problems and collaborating to solve them"
Interview Content
Lex Fridman: Up next is a conversation with Jensen Huang, the CEO of NVIDIA, a company that can be said to be one of the most significant and influential in human history. NVIDIA is the core engine driving the AI revolution, and its success is largely due to a series of key judgments and bold decisions made by Jensen as a leader, engineer, and innovator. This is the Lex Fridman Podcast. So, please welcome Jensen Huang.
From "Faster Chips" to "AI Factory"
Lex Fridman: You've led NVIDIA into a new stage of AI, moving from a focus on chip-level design in the past to rack-level design today. It could be said that NVIDIA's past victories were largely based on building the most powerful GPU, and you're still doing that, but it has expanded to extreme co-design: GPU, CPU, memory, network, storage, power, cooling, software, the rack itself, the pods you've released, and even the entire data center. So, let's start with "extreme co-design." With so many complex components and variables, what is the most challenging part of system co-design?
Jensen Huang: That's a great question. First of all, the reason we must do extreme co-design is that the problems we are solving now cannot be accelerated by a single computer or even a single GPU anymore. What you really want is—the speed-up of computation to exceed the rate at which you're adding computers. You add 10,000 computers, but you want a 1,000,000x improvement in performance. So, you have to rethink the algorithms, break the algorithms apart, reconstruct them, break the pipelines, break the data, break the model. When you distribute the problem in this way, it's not just about "scaling out," it's about "distributing the problem," and then everything becomes a bottleneck.
This is essentially the problem of Amdahl's Law: the overall system speedup depends on the proportion of the work that can be sped up. If computation accounts for only 50% of the problem, even if you increase the computation speed by a million times, the overall speedup will only be two times. So not only do you need to distribute computation, but you also need to address pipeline splitting, network connectivity issues because all these computers have to interconnect. In our scale of distributed computing, the CPU is a problem, the GPU is a problem, the network is a problem, the switch is a problem, and load balancing itself is a problem. This is an extremely complex computer science issue. So we have to use all technologies together; otherwise, you can only scale linearly or rely on Moore's Law, which is also slowing down.
Lex Fridman: There must be a lot of trade-offs involved, and it involves experts from completely different fields, such as high-bandwidth memory, networking, NVLink, NICs, optics, copper interconnects, power, cooling, and so on. Each field has world-class experts. How did you bring these people together to collaborate?
Jensen Huang: This is why my team is so large.
Lex Fridman: Can you talk about this process? How do experts and generalists collaborate? What is the overall design process when you have to fit all these things into a rack?
Jensen Huang:
You can answer with three questions. The first question is: What is "Extreme Co-Design"? Essentially, it is about holistic optimization across the entire software and hardware stack, from architecture, chip, system, system software, algorithms, to applications; this is the first layer. The second layer is, as we just mentioned, not just CPU, GPU, and network chips but also scale-up and scale-out switching systems, as well as power and cooling, because these computer systems consume a lot of power. They are indeed very efficient but still consume a significant amount of electricity in total.
So the first question is "what it is," the second question is "why do it," as we just mentioned, you need to distribute workloads to achieve benefits beyond simply increasing the number of computers.
The third question is "how to do it." This is actually the most magical part of this company. When you are designing a computer, you need an operating system; when you are designing a company, you should first think about what this company is going to produce. I have seen the organizational charts of many companies, and they all look similar—structures like hamburger, like software companies, like car companies, but in my opinion, this does not make sense. The goal of a company is to become a machine that produces products, it is a mechanism, a system used to continuously produce the products we want.
The organizational structure of a company should reflect the environment it operates in. To some extent, this also determines how the organization should function. My direct reporting team is about 60 people. I don't communicate with them one-on-one because that's impossible. If you have 60 direct reports and work to do, you can't accomplish it through one-on-one interactions.
Lex Fridman: But you still have 60 direct reports?
Jensen Huang: More than that. And these people mostly have an engineering background, with memory experts, CPU experts, optics experts, GPU experts, architects, algorithmists, design experts.
Lex Fridman: That's amazing.
Jensen Huang: Yes.
Lex Fridman: So, you have essentially been overseeing the entire tech stack and participating in deep discussions about overall design?
Jensen Huang: And we don’t have “one-on-ones.” We pose a question, and then everyone solves it together. Because we are doing extreme collaborative design, the company does this every day.
Lex Fridman: So, even if you're discussing a specific component, like cooling or networking, everyone is involved?
Jensen Huang: Yes, that's exactly right.
Lex Fridman: Everyone can say, "This solution doesn't work for power," "This doesn’t work for memory"?
Jensen Huang: That's correct. Anyone who wants to participate does, and those who don't want to can opt-out. But everyone on the team knows when they should participate. If there's a problem where someone should have contributed but didn’t, I call them out to join.
Lex Fridman: So how has NVIDIA evolved with changing environments? From initially making gaming GPUs to deep learning and now the “AI factory” — how did this transition happen?
Jensen Huang:
This can be logically deduced. We started out as an accelerator company. However, the issue with accelerators is that they are too narrowly applicable. Their strength lies in high optimization, like all specialized systems, but the problem is the more specialized it is, the narrower the market. That, in itself, is not the biggest issue. The more critical aspect is that the market size determines your R&D capabilities, and ultimately, your influence in the computing space.
So when we initially did the accelerator, we knew that was just the first step. We had to find a path to what we called 'accelerated computing.' But the problem is, once you become a computing company, you become too general-purpose, thereby weakening your specialized capability. I deliberately put these two words of tension together: Computing vs Specialized. The more you look like a computing company, the less you look like a specialized system; the more specialized you are, the harder it is to cover the entire computing landscape.
So companies have to find a very narrow path, incrementally expanding the boundaries of computing capability while not losing the most core specialized capability.
Our first step was to invent the programmable pixel shader, which was the first step into 'programmability.' The second step, we put FP32 into the shader, which is single-precision floating-point compliant with the IEEE standard, a crucial step that made many who work on stream processors and dataflow computing notice us. They began to realize that this highly capable and standards-compliant GPU could potentially be used for general-purpose computing. So they started trying to migrate software originally written for the CPU to the GPU.
Next, we introduced C language on top of FP32, forming Cg, which further evolved into CUDA. Putting CUDA on GeForce was an extremely critical decision, but the company actually couldn't afford the cost at that time. The reason we still did it is that we wanted to become a computing company. And a computing company must have a unified computing architecture that remains consistent across all chips.
A decision that almost sank the company once, upheld the entire AI era
Lex Fridman: Could you talk about that decision? At that time, when the cost was clearly unaffordable, why did you still put CUDA on GeForce?
Jensen Huang: This was a decision close to 'life or death.' I would say it was our first strategic decision close to an 'existential threat.'
Lex Fridman: For those who may not know, this was later proven to be one of the greatest decisions in the company's history. CUDA became the core computing platform of the entire AI infrastructure.
Jensen Huang:
Yes, it was later proven to be the right decision. The logic at the time was this: we invented CUDA, which expanded the range of applications our accelerator could cover. But the question was, how do you attract developers? Because the core of a computing platform is developers, and developers wouldn't come just because a platform is 'interesting'; they choose platforms with significant deployment.
Adoption rate is the most critical factor. Developers, like everyone else, want their software to reach more users. Therefore, adoption rate is a key determinant of success. An architecture can be heavily criticized, such as x86 being considered inelegant, yet it remains the dominant architecture today due to its massive adoption.
In contrast, many RISC architectures are elegantly designed by top computer scientists but ultimately failed. This illustrates a point: adoption rate defines the architecture, and everything else is secondary.
At that time, CUDA faced competition, such as OpenCL, and so on. The key decision we made was: since adoption rate is paramount, we must find a way to quickly bring this new architecture to the market.
At that time, GeForce was already very successful, shipping millions of GPUs annually. So, we decided to embed CUDA into every GeForce, making it part of every PC—whether users used it or not. This was the fastest way to build adoption.
Simultaneously, we went to universities, wrote textbooks, and offered courses to bring CUDA everywhere. In that era, PCs were the primary computing platforms, pre-cloud. We essentially put a "supercomputer" into the hands of every student, every researcher.
However, this significantly drove up the cost of GPUs, nearly consuming all of the company's gross margin. At that time, the company was valued at around six to seven billion dollars. After introducing CUDA, the valuation dropped to around 15 billion dollars due to the cost increase. We went through a very difficult phase, but we persevered.
I always say NVIDIA is a house built on GeForce. Because it was GeForce that brought CUDA to everyone. Researchers, scientists, engineers, all discovered CUDA through GeForce. Many people were gamers, building their own PCs, constructing clusters in labs with PC components—that was the starting point of CUDA taking off.
Lex Fridman: And that then became the foundational platform for the deep learning revolution.
Jensen Huang: Exactly, that's a very important observation.
Lex Fridman: Do you remember how the internal discussions went during that almost "life or death" moment?
Jensen Huang: I had to explain to the board what we were doing, and the management team understood that our gross margin would be severely compressed. You can imagine a scenario: GeForce bore the cost of CUDA, but gamers wouldn't foot the bill for it. They were only willing to pay a fixed price and wouldn't pay more because your costs went up.
We raised the cost by 50%, while the company's gross margin is only 35%, so this was a very tough decision. But we could envision a future: CUDA would enter workstations, enter supercomputers, and in these areas, we might achieve higher profits. Logically, you can persuade yourself that this is feasible, but actually realizing it took ten years.
Lex Fridman: But that is more about communication with the board. From your personal perspective, how did you make this kind of "bet on the future" decision? NVIDIA has always made bold decisions to predict the future, even define the future, how did you do it?
Jensen Huang: First of all, I have a strong curiosity. Then there is a reasoning process that makes me very convinced that a certain result will definitely happen. When I truly believe in something in my mind, that kind of future becomes very clear, almost impossible not to happen. There will be a lot of pain in the middle, but you have to believe in what you believe in.
Lex Fridman: So, do you first build the future in your mind and then engineer it out?
Jensen Huang:
Yes. You will reason about how to get there, why it must exist. We will reason repeatedly, and the management team will also participate, we will spend a lot of time on this.
Next is a very crucial ability. Many leaders will first be silent, learn, and then one day suddenly make a "declaration," such as in the new year, a big adjustment, a large layoff, organizational restructuring, new mission, new logo. I don’t do it that way.
When I begin to realize that something is important, I immediately tell the people around me: this is important and will have an impact. I will explain step by step. Many times I have already made the decision, but I will use every opportunity—new information, new insights, new engineering progress—to continuously shape everyone's understanding.
I do this every day, with the board, with the management, with the employees. I am constantly shaping their belief systems. So, when one day I say, "We are going to acquire Mellanox," everyone will feel that this is an obvious decision.
When I say, "We are going all-in on deep learning," in fact, I have laid the groundwork for a long time. When I announce it, many people will actually say, "Why are you only saying this now?"
In a sense, this is like "leading from behind," but in reality, you have been shaping consensus all along. You want everyone to walk together, rather than suddenly announcing a decision that no one understands.
Lex Fridman: And you're not just shaping cognition within the company, you're shaping the entire industry.
Jensen Huang:
We don't actually sell computers directly, nor do we sell the cloud directly. We are a computing platform company. We do vertical integration design at every layer, but at the same time, we are open at every layer to allow other companies to integrate it into their own products, services, clouds, and supercomputers.
So if I can't convince the entire industry first, my product won't land. That's why GTC is so important—it's about "previewing the future." When we actually release the product, everyone will say, "Why are you only doing this now?"
Why is AI Becoming More "Cash-Burning"? Four Types of Scaling are Stacking Up
Lex Fridman: You have long believed in scaling law. Do you still believe in it now?
Jensen Huang: Of course, and now there are more scaling laws.
Lex Fridman: You mentioned four types before: pre-training, post-training, inference stage, and agentic scaling. When you look to the future, whether in the short term or the longer term, what potential "bottlenecks" are you truly concerned about? What are the issues that you feel must be addressed, even keeping you up at night?
Jensen Huang: Let's look back at the perceived "bottlenecks" of the past.
Initially, it was pre-training scaling, where people believed that limited high-quality data would restrict AI's intelligence improvement. Ilya Sutskever even said, "We're running out of data," creating industry panic. But the reality proved otherwise. We will continue to expand data sources, with a large part of it being synthetic data. In fact, the information passed between humans is fundamentally "synthetic" as well. You create content, I consume it, then process it, and transmit it. Now AI can start from real data, expand, enhance, and generate a large amount of data. Therefore, the post-training phase is still expanding. The future limitation of model training will no longer be data but computing power.
Next comes the inference phase. Many people used to think that inference was simple, and training was hard. But this is actually unreasonable because inference is essentially "thinking," which is much harder than "reading." Training is more like memorization and pattern recognition, while inference involves reasoning, planning, searching, problem decomposition, all of which require a lot of computation. As it turns out, our initial assessment was correct, and inference computation is very intensive.
Moving forward, we have agentic scaling. We are now not just a model but an "agent system" that can call tools, access databases, and generate sub-agents. Similar to a company, rather than enhancing one person's ability, it is easier to expand capabilities by adding to the team. AI is the same way, able to rapidly replicate and scale. So, this is a new scaling law.
These processes will form a cycle: the agent generates data, the data goes back to pre-training, then enters post-training, then enters inference, then enters the agent system, continually cycling. Ultimately, the growth of intelligence boils down to a core variable: compute power.
Lex Fridman: But here's a challenge; you must anticipate these changes in advance because different stages require different hardware, such as the MoE architecture, sparsity, and so on. And the hardware cycle is over a few years, so you can't adjust whenever you want.
Jensen Huang: Exactly. AI model architectures change about every 6 months, while system architectures and hardware change about every 3 years. So you must anticipate the future two to three years in advance. We have three methods: first, we do our own research, including basic research and applied research, we build our models; second, we collaborate with almost all AI companies to understand their challenges; third, we build a sufficiently flexible architecture, such as CUDA, which is both efficient and flexible.
For example, when MoE emerged, we introduced NVLink 72, which can run a 100 trillion parameter model as if it were a single GPU. Another example is the Grace Blackwell rack and the Vera Rubin rack, their designs are completely different because the former is designed for LLM inference, while the latter is designed for agent systems.
Lex Fridman: But these designs were completed before the emergence of Claude Code, Codex, and OpenClaw. How did you anticipate that?
Jensen Huang: It's actually not that difficult, you just need to use reasoning. Suppose LLM is to become a "digital employee," it must access real data, perform research, use tools. So it must have an I/O system, make tool calls. Some say AI will replace software, but that's not correct. Just like a robot, it won't turn its hand into a hammer or scalpel, but rather use tools. It's okay if it doesn't know how to use them the first time, it can read the manual and learn quickly. So these capabilities are inevitable.
When you reason like this, you'll find that we've actually reinvented the computer. The agent architecture I talked about at GTC two years ago almost exactly corresponds to OpenClaw today. The significance of OpenClaw to agents is similar to what ChatGPT means for generative AI.
Lex Fridman: Indeed, it's a special moment.
Jensen Huang: Yes.
Lex Fridman: But there is also an issue here; when technology becomes so powerful, it also brings security risks. We as individuals and as a society are trying to find a balance.
Jensen Huang: Yes, we immediately involved a large number of security experts to study this issue. We created a system called OpenShell, which is now integrated into OpenClaw. At the same time, NVIDIA also introduced NemoClaw.
Lex Fridman: Yes, its installation is also very simple and can ensure system security.
Jensen Huang: We proposed a principle: at any given time, one can only possess two out of three capabilities—access to sensitive data, code execution, external communication. If all three capabilities are simultaneously present, it poses a risk. So we ensure security through this "choose-two-out-of-three" approach. Additionally, we have incorporated enterprise-level access control and a policy engine, allowing companies to manage based on their own permission systems. We will do our best to make OpenClaw more secure and controllable.
The Limit of AI is not Algorithm, but Electricity
Lex Fridman: You just talked about many things that were once considered bottlenecks in the past but were later overcome. So looking at it now, in a future where agents will be ubiquitous, what will be the real bottleneck?
errorLex Fridman: But you don't seem to see the supply chain as the most worrying bottleneck?
Jensen Huang: Because I have systematically addressed these issues one by one, I can now sleep soundly. We will reason from first principles: What does a shift in system architecture entail? How will software change? How will engineering processes change? How will the supply chain change? For example, the NVLink 72 rack shifted the integration of supercomputing from inside the data center to the edge of the supply chain. Previously, components were delivered to the data center for assembly, but now they are assembled into complete systems directly in the supply chain and then transported.
This means that the supply chain itself needs to have stronger manufacturing capabilities, such as supporting large-scale power testing. We even need the supply chain to have gigawatt-level power capability to test these systems. So I will personally communicate with suppliers, tell them about future needs, and get them to invest billions of dollars. They trust me, and I will give them enough information and time to understand these changes.
Lex Fridman: So, are you worried about specific bottlenecks? Like EUV, packaging capacity, and so on?
Jensen Huang: I'm not. Because I have told them what I need, and they have told me how they will do it. I trust them.
Lex Fridman: Now let's get back to the power issue. How do you see the energy issue?
Jensen Huang: I would like everyone to pay attention to a fact: Our power grid is designed based on the "worst-case scenario," such as peak demand during extreme weather. But in reality, 99% of the time, we are far from reaching that peak, most of the time maybe only operating at around 60%. This means that most of the time, the power grid has a lot of idle capacity, but this capacity must exist because critical infrastructure like hospitals and airports must have power at crucial moments.
So what I'm thinking is, can we design a mechanism where when the power grid needs to operate at full capacity, data centers reduce power consumption; and during most of the time, utilize this idle power? For example, data centers can reduce performance, migrate tasks, or even temporarily downgrade services during peak times. This way, we can more efficiently use the power grid.
But the current problem lies in three areas: First, customers demand that data centers must be 100% available; second, data center designs must support this dynamic downscaling; and third, power companies also need to provide more flexible power supply modes. If all three points are achieved, we can significantly improve power utilization efficiency.
So I think the future of how we use computers and build data centers should not just aim for 100% uptime. These very stringent contracts actually put a lot of pressure on the electricity grid because they require the grid to not only meet peak demand but also to continue expanding on top of that. What I really want to leverage is just that part of the surplus idle power.
Lex Fridman: This point is indeed not discussed enough. What do you think is currently the main obstacle?
Jensen Huang: I think this is a tripartite problem.
First is the end customer. The end customer makes demands to the data center: you absolutely cannot go offline, absolutely cannot be unavailable. In other words, what the customer expects is perfection. And to achieve this perfection, you need backup generators, and the electricity grid provider also needs to be close to perfect. So every link needs to strive for "six nines."
So I think the first thing is to make all customers, all CEOs really aware of what they are actually asking for. Many times, the people signing these contracts are actually just someone in the data center operations team, far removed from the CEO. I bet many CEOs have no idea what these contract terms are. I am ready to talk to all of them.
These CEOs probably are not even paying attention to these contracts being signed. Everyone wants to sign the best contract, which is understandable, of course. And then these requirements are passed on layer by layer to cloud providers, who then pass them on to utilities, so the entire chain is demanding "six nines." So the first step is to make customers and CEOs truly understand what they are asking for.
The second thing is that we must build data centers that can "gracefully degrade." In other words, if the grid tells us, "You need to reduce your power usage to 80%," we should be able to say, "No problem."
We can reschedule workloads. We will make sure data is never lost, but we can reduce the computing rate, use a little less power. The quality of service will decrease slightly. For the most critical workloads, I will immediately migrate them elsewhere so they are not affected. So, whichever data center can still maintain 100% uptime, let it handle the most critical part.
Lex Fridman: How challenging is this intelligent, dynamic power allocation for data centers from an engineering perspective?
Jensen Huang: As long as you can define the problem clearly, you can engineer it. You raised the question exceptionally well. As long as it aligns with the physical laws at the level of first principles, I believe we can achieve it.
Lex Fridman: You just mentioned three things, what was the third one?
Jensen Huang: The second one is the data center itself, and the third one is, utility companies also need to realize that this is actually an opportunity.
They can't always say, "You have to wait five years for me to expand the grid to that level of capacity." If you are willing to accept that level of power assurance, then I can actually supply you with power at that price next month.
So if utility companies can also provide more layered power commitments, I think the market will find corresponding solutions on its own. There is currently too much waste in the power grid, and we should leverage it.
Lex Fridman: You previously highly praised Elon Musk's ability to build the Colossus supercomputer in Memphis. What do you think is worth learning from his approach?
Jensen Huang: Elon is involved in a very wide range of areas, but he is a very strong systems thinker. He continuously asks: Is this really necessary? Must it be done this way? Why is it taking so long? He compresses the system to the minimum required complexity while retaining core capabilities.
He is also extremely hands-on, wherever there is a problem, he goes there. He breaks a lot of "conventions" and "processes" to truly drive things forward. Additionally, his sense of urgency permeates the entire supply chain. He makes all suppliers prioritize him, which is crucial.
Lex Fridman: Do you have a similar approach in NVIDIA's co-design?
Jensen Huang: Co-design itself is the ultimate form of systems engineering. We also have a concept called "speed of light thinking." This is not just speed, it's the physical limit. We benchmark all issues against the physical limit: memory speed, processing speed, power consumption, cost, time, manufacturing cycle, and so on. We first ask: At the physical limit, what can be achieved? And then make trade-offs in reality.
I don't really like the "continuous optimization" approach. If a process currently takes 74 days, and someone says it can be optimized to 72 days, I don't quite accept it. I'd rather start from scratch and ask: Why is it 74 days? If starting from scratch, how quickly can it be done now? Many times the answer might be 6 days. Then you understand why the remaining 68 days exist.
Lex Fridman: In such a complex system, is the principle of "simplicity" still important?
Jensen Huang: Of course. What we pursue is "necessary complexity" and "simplicity wherever possible." We must continuously ask: Is this complexity necessary? If not, remove it.
Lex Fridman: But your system is already extremely complex, such as the Vera Rubin pod, with trillions of transistors and thousands of GPUs.
Jensen Huang: Yes, this is the most complex computer system in the world.
Lex Fridman: This is very interesting. You recently visited China. So I'm very curious to ask you a question: China's astonishing rise in the tech industry over the past decade has been remarkable. How do you see them building so many world-class companies, world-class engineering teams, and such a tech ecosystem that consistently produces amazing products in such a short time?
Jensen Huang:
There are many reasons. Let's start with some basic facts. Globally, about half of AI researchers are Chinese, roughly speaking, and most of them are still in China. We have many here as well, but China itself still has a large number of excellent researchers. China's tech industry emerged at a critical time—the era of mobile internet and cloud computing. Their main contribution path is software, and this country has a very strong foundation in science and math education; young people are highly educated. Growing up in the software era, they are very familiar with modern software systems.
Additionally, China is not a single economic entity but is made up of multiple provinces and cities that compete with each other. This is why you see a large number of new energy vehicle companies, numerous AI companies, and almost every industry has many companies simultaneously doing similar things. This internal competition is very fierce, and typically, only excellent companies survive.
Furthermore, their social culture is "family first, friends second, company third." In this structure, information exchange between different companies is very frequent and essentially creates a long-term open environment. Therefore, their greater investment in open source naturally follows because they naturally think, "What are we really protecting?" There is a significant overlap of relationships among engineers—relatives, friends, classmates, with "classmates" being almost a lifelong relationship. This rapid knowledge dissemination makes open-source more efficient as there is a lack of strong proprietary motivation in the technology itself. The open-source community further amplifies and accelerates the innovation process.
So you will see that the combination of top talent, open-source-driven rapid innovation, highly interconnected relationships, and intense competition ultimately produces very strong technical outcomes. From this perspective, China is currently the fastest-innovating country in the world. Behind all this is the foundational factors—the education system, the emphasis on learning within families, cultural structure, and the fortuitous positioning within a key window of exponential technological development.
Lex Fridman: Culturally, being an engineer is a very "cool" thing.
Jensen Huang: Yes, it is an "engineer-type country." Many leaders in the U.S. have a legal background, which is for governance and institutional stability; whereas many leaders in China are themselves outstanding engineers.
Lex Fridman: You mentioned open source earlier, and I'd like to dive into that. You have always held Perplexity in high regard.
Jensen Huang: I love it.
Lex Fridman: Also, thank you for open-sourcing Nemotron 3 Super, a 1.2 trillion parameter MoE model that can now be used in Perplexity. How do you view the long-term significance of open source? Companies like China's DeepSeek, MiniMax are driving open-source AI, and NVIDIA is also working on near-SOTA open-source models. What is your overall assessment?
Jensen Huang: First, if we are to be an excellent AI computing company, we must understand how models have evolved. What I really like about Nemotron 3 is that it is not a pure Transformer but a combination of Transformer and SSM. We have also been early in laying out paths on conditional GANs, progressive GANs, which have gradually evolved into diffusion. It is this accumulation in model architecture and foundational research that allows us to anticipate early on what kind of computing systems future models will require. This, in itself, is part of our "extreme co-design."
Second, on the one hand, we need to have world-class models as products, which can be proprietary; but on the other hand, we also hope that AI can spread across all industries, countries, researchers, and students. If everything is closed off, it is challenging to conduct research and to innovate further on this basis. Therefore, for many industries, open source is a necessary condition for participating in the AI revolution. NVIDIA has the scale and the motivation to continuously build these models in the long term, and we also have the ability to drive the entire ecosystem to get more people involved.
The third point is, AI is not just language. Future AI will call upon tools, submodels, and involve different modalities such as biology, chemistry, physics, fluids, thermodynamics, which do not all exist in language form. Therefore, there must be continuous efforts to advance directions like weather AI, bio AI, physics AI, etc., and constantly approach the frontier. We don't make cars, but we hope that every car manufacturer can use the best models; we don't engage in drug development, but we hope that companies like Gilead can have the best bio AI system.
So, looking at AI from the breadth of AI, the popularity of AI, and the collaborative evolution of AI and computing architecture, open source is necessary.
Lex Fridman: Once again, thank you for open-sourcing Nemotron 3.
Jensen Huang: We have not only open-sourced the model, but also the weights, data, and construction methods.
Lex Fridman: Truly remarkable.
Jensen Huang: Thank you.
Lex Fridman: You were born in Taiwan, China, and have had a long-term partnership with TSMC. I would like to ask, how do you understand TSMC's culture, and how has it achieved such unique success?
Jensen Huang: The biggest misconception outsiders have about TSMC is that its core is only technology. Of course, their technology is indeed very strong, including transistors, metal layers, advanced packaging, 3D packaging, and silicon photonics. But what truly sets them apart is their coordination capabilities in response to the entire industry's demands.
They need to simultaneously address the constantly changing needs of hundreds of global customers: order increases or decreases, customer switches, emergency add-ons, production pauses, restarts, and so on. Despite such a highly dynamic environment, they are still able to maintain high throughput, high yield, low cost, and an extremely high level of service.
They take commitments extremely seriously. When they say a wafer will be delivered at a certain time, it will be delivered, and this directly affects the operations of customer companies. Therefore, their manufacturing system itself can be described as a miracle.
The second point is culture. On one hand, they continue to drive the technological frontier, and on the other hand, they are highly customer-centric. Many companies can only do one of these well, but they have managed to do both at a world-class level.
The third point is an intangible asset, trust. This is very important. I could completely build my company on top of theirs, and this trust is accumulated through long-term cooperation.
Lex Fridman: This trust comes from both long-term collaboration and interpersonal relationships.
Jensen Huang: Yes. We have been collaborating for thirty years, involving tens if not hundreds of billions of dollars in business, but we don't even have a contract between us.
Lex Fridman: Truly amazing. There is a saying that in 2013, TSMC founder Morris Chang invited you to be the CEO, and you declined. Is that true?
Jensen Huang: That is true. I was very honored, but at that time, I was also very clear that what NVIDIA was doing was extremely important. I had seen what it would become in the future and the impact it could have. It was my responsibility, and I had to make it happen. So I declined, not because the opportunity was not important, but because I couldn't walk away from it.
Lex Fridman: I think NVIDIA and TSMC are both one of the greatest companies in human history.
Jensen Huang: Thank you.
Lex Fridman: I have to ask a question. Using the words commonly heard in the tech industry, what is your biggest "moat," meaning, what is the core advantage that helps you fend off competition?
Jensen Huang:
At the core, it is the scale of our computing platform, which is the installed base of CUDA. We didn't have this advantage twenty years ago, but today, the situation is completely different. Even if someone were to develop a technology similar to CUDA, it would be challenging to change the current landscape. Because the key has never been just the technology itself, but the systemic advantage formed by long-term investment, continuous iteration, and ongoing expansion.
The success of CUDA was not achieved by a handful of people but was the result of 43,000 employees and millions of developers working together. Developers choose to develop on CUDA because they believe we will maintain this platform for the long term and continue to drive its development. Therefore, the "installed base" itself is the most crucial advantage.
When this scale advantage is combined with our execution speed, it creates a stronger barrier. Historically, few companies have been able to build such a complex system at this speed, let alone continuously iterate on an annual cadence.
From a developer's perspective, if you choose to support CUDA, you can expect it to be stronger six months later, and at the same time, you can reach hundreds of millions of devices worldwide, covering all cloud platforms, nearly all industries, and various countries. If you open-source a project and prioritize CUDA support, you not only gain scale but also growth velocity.
Adding to that is the aspect of "trust," where developers believe NVIDIA will maintain this ecosystem in the long term. If I were a developer, I would prioritize choosing CUDA.
The second advantage is our ecosystem. We are highly integrated into the computing system vertically and embedded horizontally in almost every company's product stack. We exist on Google Cloud, Amazon, Azure, and also on new cloud platforms like CoreWeave, covering supercomputers, enterprise systems, edge devices, cars, robots, satellites, and even space.
In other words, a unified computing architecture that has penetrated nearly every industry.
Lex Fridman: So, with the development of AI factories, how will this CUDA's installation advantage evolve? Will the future NVIDIA essentially become an "AI factory company"?
Jensen Huang: In the past, our compute unit was the GPU; later it became a whole computer, then a cluster; now, it's a complete AI factory. In the past, when I launched a new generation product like "today's Ampere launch," I would hold up a chip. That was my "mental model" at the time. But today is different. Holding up a chip has become somewhat "cute" to some extent—it no longer represents what we truly built.
Now the model in my mind is a huge system: it connects to the grid, has power generation, cooling systems, extremely complex network structures, tens of thousands of people installing on-site, and tens of thousands of engineers supporting behind the scenes. Starting up such a system is not a matter of flicking a switch; it requires thousands of people working together.
Lex Fridman: So when you think about "a computing unit" now, you're actually thinking about a whole set of racks, a pod, rather than a single chip?
Jensen Huang: It's the entire infrastructure. And I hope my next cognitive leap is to understand the act of "building a computer" as a "planetary-scale" issue. That would be the next step.
Lex Fridman: Do you think NVIDIA could potentially reach a $1 trillion market cap in the future? Or looking at it from a different angle, if that were to happen, what would the world look like?
Jensen Huang: I believe NVIDIA's growth is highly likely, even inevitable in my view. Let me explain the reason.
First, we are already one of the largest computing companies in history. This alone is worth pondering: Why is that?
There are two reasons, both fundamental technology shifts.
First, there has been a shift in the computing paradigm. Past computing was fundamentally a "retrieval system." We pre-wrote content, recorded content, generated files, and then retrieved this content through a recommendation system or search system. In other words, it was a "human-pre-generated + file-retrieval" system. Now, AI computation is based on context, requiring real-time processing and token generation. We have transitioned from "retrieval-based computing" to "generation-based computing."
In the old system, we needed a lot of storage; in the new system, we need a lot of computation. Therefore, the computational demand will increase significantly. The only scenario that could change this trend is if this generative computation proves to be ineffective. But in the past 10 to 15 years of deep learning research, and the recent progress in the last 5 years, I am more confident than ever before.
The second change is that the role of computers in the world has changed. In the past, computers were more like a warehouse; now, they are more like a factory. A warehouse itself does not directly generate revenue, while a factory is directly linked to revenue. Computers are no longer just storage systems, but production systems. The "goods" it produces are tokens. And these tokens are being consumed by different groups of people, showing layers, just like the iPhone: there are free ones, high-end ones, and mid-tier ones.
Intelligence, fundamentally, has become a scalable product. In the future, there will soon be a situation where someone is willing to pay $1,000 for every million tokens. It's not a question of whether this will happen, it's just a matter of time.
So the question becomes: How many "AI factories" does the world need? How many tokens are needed? How much is society willing to pay for these tokens? If productivity increases significantly as a result, what changes will occur in the global economy? Will we discover new drugs, new products, new services?
When you consider all these factors together, I am very certain: Global GDP will accelerate. At the same time, the share of expenditure on computation will be an order of magnitude higher than in the past.
In this context, coming back to NVIDIA: our role in this new economy will be much larger than it is now. As for the numbers, such as whether it is possible to reach $3 trillion in revenue in the future? The answer is, of course, possible. Because this is not constrained by any obvious physical limits.
NVIDIA's supply chain is supported by 200 companies working together, and we are expanding through the entire ecosystem. The only real constraint is: energy. And I believe the energy issue can ultimately be solved.
So, these numbers themselves are just "numbers." I remember when NVIDIA first surpassed $1 billion in revenue, someone told me, "A fabless semiconductor company cannot exceed $1 billion." Later, someone else said, "You cannot exceed $25 billion."
These assessments are not based on first principles. The real question to ask is: What are we creating? How big is this opportunity?
NVIDIA is not competing for existing market share. Much of what we do is for a market that does not yet exist. That's why it's hard for outsiders to imagine our limit because there is no ready reference point. But I have enough time. I will continue to deduce and articulate. Every GTC will make that future more concrete. In the end, we will take that step. I am 100% certain of this.
Lex Fridman: Looking at it from the perspective of a "token factory," the whole system can actually be understood as: generating tokens per watt, per second, and each token has value, with different value to different people. In this way, the whole world is made up of numerous "token factories." Starting from first principles, as long as the problems AI can solve continue to increase, we can deduce that the demand for these "factories" in the future will grow exponentially.
Jensen Huang: Yes. One thing that excites me a lot is that the "iPhone moment of tokens" has arrived.
Lex Fridman: What do you mean?
Jensen Huang: Agent. Agent is becoming the fastest-growing application form in history.
Lex Fridman: So, starting from last December, people really began to realize the capabilities of systems like Claude Code, Codex, and OpenClaw? Honestly, I feel a bit embarrassed to admit: when I was at the airport, I started "speaking to the computer to code" for the first time, just like communicating with colleagues. I'm not sure what it will look like for everyone to interact with AI like this in the future, but the efficiency is indeed very high.
Jensen Huang: What's more likely is that your AI will constantly "interrupt" you. Because it completes tasks very quickly, it will continuously give you feedback: "This is done, what's the next step?"
Lex Fridman: This is truly an incredible future.
Lex Fridman: I've seen you mention that your success is largely due to working harder than others and being able to endure more pain than others.
This "pain" actually encompasses many aspects, such as dealing with failure, the engineering challenges and cost issues we just talked about, as well as interpersonal issues, uncertainty, responsibility, fatigue, awkwardness, and those moments you mentioned when the company was on the brink of collapse.
But beyond that, there's also pressure. As the CEO of a company surrounded by governments and economies around the world, shaping resource allocation and AI infrastructure planning, how do you handle this kind of pressure? With so many countries and people relying on you, where does your strength come from?
Jensen Huang: I'm acutely aware that NVIDIA's success is important to the United States. We generate a significant amount of tax revenue, establish a leading technological position, and technological leadership itself is part of national security. A wealthier nation can better drive social policies. At the same time, we are also driving reindustrialization, creating a large number of job opportunities, rebuilding domestic manufacturing capabilities, including chips, computers, and AI factories. I am also very aware that many ordinary investors—teachers, police officers—have gained wealth from investing in NVIDIA. Additionally, NVIDIA is part of a vast ecosystem, with many partners upstream and downstream relying on us.
Faced with all this, my approach is very simple: break down the problem.
I ask myself, what is the current situation? What has changed? Where are the challenges? What can I do? Once the problem is broken down, it becomes a series of actionable tasks.
Then there's only one question left: Did you do it? Or did you have someone else do it? If you believe something must be done but neither did it yourself nor drove others to do it, then there's no point in complaining about it.
I am quite strict with myself. But at the same time, I also avoid panic by breaking down problems. I can sleep peacefully knowing that I have identified all the risk points and informed the relevant responsible parties. As long as things are progressing as they should, there is no need to be anxious.
Lex Fridman: Have you experienced psychological lows in this process?
Jensen Huang: Of course, many times.
Lex Fridman: And your method is still to break down the problem?
Jensen Huang: Yes. Another point is "learning to forget." In machine learning, there is an important ability called "selective forgetting." It's the same for humans—you can't carry everything with you. I quickly break down a problem and then spread the pressure around. Anything that worries me, I tell the relevant people as soon as possible rather than carrying it myself. Of course, you also need to be strict with yourself—don't get immersed in emotions, just keep moving forward.
Another thing is that you will be attracted to the "future." Like athletes, they only focus on the next point, not the mistake of the previous one.
Lex Fridman: You once said that if you had known how difficult NVIDIA was from the beginning, you might not have done it.
Jensen Huang: Yes. But what I want to express is: this almost applies to everything worth doing. You need a "childli
