Sequoia 2026 AI Annual Conference Highlights: 13 Top AI Players Tell You AGI Is Already Here

Bitsfull2026/05/07 16:4419896

概要:

13 Fireside Chats from Silicon Valley's Premier AI Conference

Introduction


In late April 2026, Sequoia Capital hosted the fourth AI Ascent conference in San Francisco. This event brought together core AI companies such as OpenAI, DeepMind, Anthropic, NVIDIA, Waymo, as well as emerging direction-focused startups like ElevenLabs, XBOW, Recursive Intelligence, Starcloud, among others. The 13 dialogues covered foundational models, programming paradigms, robotics, autonomous driving, chip design, space computing power, and novel computing architectures, essentially encompassing several cutting-edge trends in the AI industry.


Compared to previous years, this edition of AI Ascent had a more direct focus: AI is no longer just a tool for enhancing efficiency but is beginning to enter real workflows, taking over some complex tasks that were previously only achievable by humans. In the opening keynote, Sequoia referred to this shift as the advent of "Functional AGI"—not implying that machines are now equivalent to humans in all respects, but rather suggesting that, from a commercial and productivity standpoint, long-range intelligent systems have crossed the threshold from demonstration to usability.


This was also the core backdrop of the conference: as intelligence becomes more affordable, accessible, and scalable, the competitive focus of AI is shifting from "what models can do" to "how to integrate them into the real world." Software, services, organizations, hardware, energy, security, and physical space may all be redesigned as a result.


The story Sequoia sought to convey was very clear: intelligence is no longer a luxury but is transforming into a new industrial raw material. The next crucial step may not be who has the smarter models but who can more rapidly comprehend customers, restructure processes, schedule agents, and transform this inexpensive intelligence into sustainable business systems.


Therefore, this conference discussed not only the next steps in AI technology but a larger question: as machines take on more and more cognitive labor, how should humans, companies, and society redefine their value.


Main Themes Throughout the Conference


First, intelligence is becoming a commodity.
Sequoia likened this transformation to the late 19th-century "aluminum": once more valuable than gold, it became a readily available industrial material within decades due to the widespread adoption of the Hall–Héroult process. Today, PhD-level expertise and cognitive barriers that historically defined the competitiveness of the middle class may be undergoing a similar fate. Advanced intelligence is no longer naturally scarce but is starting to be mass-produced, called upon, and distributed by models.


Second, bottlenecks are shifting from machines to humans.
Greg Brockman spoke a phrase at this year's conference that was repeatedly cited: when agents can work autonomously, human attention will become the scarcest resource in the entire economy. Karpathy put forth a more direct assessment of the same idea: when machines can handle almost all implementation details, the only ability humans cannot afford to lose is figuring out what they truly want. The question is no longer whether machines can do it, but whether humans can set the right goals, assess the reliability of outcomes, and decide what is worth accomplishing.


Third, the programming is being solved, the organization is not.
Anthropic has already achieved a significant amount of code generated by models, with different agents even able to collaborate autonomously on Slack. Boris Cherny takes it further: the real moat is no longer a particular model version, but the organizational alignment around AI. For existing companies, this is an unfriendly conclusion—because the gap comes not just from tool proficiency, but from whether a company is willing to redesign processes, permissions, modes of collaboration, and management structures around the agent.


Fourth, AI is returning from the digital world to the physical world.
Jim Fan's robots, Waymo's 20 million autonomous rides, ElevenLabs' emotional voice, from different angles, indicate that AI is no longer just a screen tool for processing text, code, and images; it is starting to understand and intervene in light, sound, force, motion, and space. In the past decade, "software ate the world" was the main theme; next, AI might directly enter the physical world, transforming cars, factories, robots, voice interactions, and physical manufacturing itself.


Fifth, the end of compute is at the physical bottom.
As ground data centers reach their limits in land, power, and cooling, a set of more aggressive companies has offered different solutions: Starcloud wants to send chips into space, Recursive enables AI to autonomously design chips, Unconventional AI attempts to mimic the brain by bypassing the von Neumann architecture, and Flapping Airplanes directly questions the "brute-force scaling"—if humans can learn the same skills with much less data, today's AI algorithms might be fundamentally inefficient. The finish line of compute competition is shifting from buying more GPUs to a bottom-up reconstruction of energy, chips, architecture, and data efficiency.


Sixth, security has entered the asymmetric battlefield of "AI vs. AI."
XBOW's intelligent agent topped the global white-hat hacker ranking, indicating that AI is no longer just an assistant tool for security researchers but is capable of independently conducting vulnerability discovery, validation, and exploitation through autonomous attack systems. More alarmingly, as open-source model capabilities improve, this kind of attack capability may rapidly spread in the next 6 to 9 months. Cybersecurity is no longer a human hacker's game of offense and defense but a countdown-initiated AI arms race.


Putting these clues together reveals that the AI industry in 2026 is in an uncomfortable position: technical capabilities have far outpaced product forms, organizational structures, and societal norms. The models are getting stronger every day, but the "containers" supporting them—whether they be enterprise processes, application interfaces, or human attention—have not kept pace.


The essence of the entire conference's discussions essentially revolved around answering one question: In a world where machines can increasingly perform more and more cognitive labor, what is left for humanity?


Sequoia's answer, somewhat counterintuitively, is emotion, trust, and those things that cannot be produced at scale. Brockman's answer is "what do you want," and Karpathy's answer is "can you judge if the machine is doing the right thing." These various answers ultimately point to the same thing: when intelligence itself is no longer scarce, intent, judgment, and relationships will become the new hard currency.


Below is a summary of all 13 dialogues from this conference.


Forum Summary


Keynote Speeches


Sequoia Partner Opening Keynote: This is AGI


Speakers Pat Grady, Sonya Huang, and Konstantine Buhler are the three core partners of Sequoia Capital's AI investment line. Sonya Huang, the author of the viral 2022 article Generative AI: A Creative New World, is considered one of the earliest institutional investors to systematically bet on generative AI. The trio collectively authored the 2026 piece This is AGI, which forms the intellectual framework of this conference. Sequoia Capital itself is Silicon Valley's oldest top-tier venture capital firm, having early investments in companies like Apple, Google, Nvidia, Stripe, OpenAI, among others.


AI is a "computing revolution" that fundamentally disrupts the nature of information processing, rather than just an "communication revolution" that accelerates distribution. The past Internet and mobile only changed the path of information dissemination, while AI changes the underlying logic of information generation, causing the technological foundation developers use to build applications to shift daily. The significance of this assessment lies in the fact that in the unstable "downpour moments," traditional stable tech stacks are a thing of the past, and developers must learn to dance with the ever-evolving model foundation.


AI will penetrate a $10 trillion market that is ten times larger than traditional software by directly delivering "professional services." The global software market TAM (Total Addressable Market) is only worth billions of dollars, while just the U.S. legal services vertical alone is a $400 billion market, a scale equivalent to the entire software industry. This argues for a key transformation: the commercial value of AI is no longer in selling itself as a tool to humans but in directly taking over and delivering high-value work previously done by human experts in the form of an agent.


From a business perspective, the ability to autonomously address failures of long-duration agents signifies the arrival of AGI (Artificial General Intelligence). If a system can be dispatched to carry out tasks, self-heal in case of failure, and persist until completion, it is functionally equivalent to AGI. This assessment counterintuitively reminds us: stop obsessing over academic definitions, as AI with independent execution capabilities has evolved from being a "faster horse" to a "vehicle" that changes the dimension of competition, achieving efficiency gains of 10 to 40 times.


In a moment of rapidly changing foundational capabilities, the sole logic behind building a moat is "extreme customer proximity." The MAD strategy—Moats, Affordance, and Diffusion—advocates for locking in value through a customer-back approach rather than a tech-out approach, emphasizing deep customer immersion over model chasing, as human needs evolve much slower than model capabilities.


The autonomy of agents is transitioning from "minute-level assistants" to "hour-level autonomous employees." The meter chart measuring the model's ability to stay on the right track in complex tasks has leaped from minutes a year ago to several hours now, sufficiently supporting dark factories that operate without human oversight. This breakthrough indicates that productivity bottlenecks have been eliminated, making extraordinary iterations like "rewriting 8 million lines of code in 6 weeks" the new normal.


Human society is on the eve of a "Cognitive Industrial Revolution," where machines will undertake 99.9% of global cognitive labor. Similar to how the Industrial Revolution replaced 99% of physical labor with engines, the vast majority of analysis, decision-making, and creation will also be undertaken by neural networks in the future. The proposition here is that intelligence will no longer be a scarce human resource but a low-cost industrial-grade consumable that can be infinitely scaled and on-demand.


Advanced cognitive skills are about to experience an "Aluminum Moment," transitioning from an expensive luxury to a cheap commodity. Just as aluminum, once more valuable than gold, became readily disposable due to the widespread adoption of electrolysis, AI's instant access to PhD-level knowledge will have a similar effect. This foreshadows a harsh reality: the years of accumulated knowledge barriers may collapse in an instant, and intelligence itself will no longer command a premium due to scarcity.


After intelligence has been thoroughly commoditized, interpersonal relationships and emotional connections will become the sole true value anchor of human society. Just as photography pushed art from realism to the soul-expressing Impressionism, AI's optimal efficiency often presents a "beyond-human-intuition" space. The ultimate counterintuitive yet profound conclusion is this: in a future where machines handle all work, only trust and emotions between humans will be the ultimate non-scalable hard currency that machines cannot produce.


If you could only remember one thing from this conversation, what would it be?


The valuable expertise that was once highly regarded will soon become as cheap as a plastic bag. In the future, what will truly keep you competitive is no longer a brain that can solve difficult problems, but an emotion that can understand others and build trust.


Models and Cognition


Andrej Karpathy: From Vibe Coding to Agent Engineering (OpenAI Founding Team)


Speaker Andrej Karpathy is the AI community's most influential "educator-scientist." A founding member of OpenAI, he later served as Tesla's AI Director, overseeing the autonomous driving visual system, before leaving Tesla in 2024 to found the AI education company Eureka Labs. His hands-on series of neural network tutorials on YouTube has served as introductory material for countless AI engineers. Key concepts such as "Software 2.0" and "Vibe Coding" are terms he coined.


Even top experts find themselves feeling "left behind" in the AI wave, as the evolution of technology has transitioned from an auxiliary tool to autonomous systems. By early 2026, the speaker found that he no longer needed to modify AI-generated code snippets, merely trusting the system to perform complex tasks. The significance of this realization lies in the fact that when AI can achieve self-correction and closed-loop delivery, the "baseline" of developers relying on accumulated experience is violently raised, and individual learning speeds have difficulty keeping pace with the rate of technological shift.


Modern computing is entering the era of Software 3.0, with LLM fundamentally a new type of computer leveraged on context. Software 1.0 involved coding, 2.0 was weight training, and 3.0 entails programming through prompting within the context (the model's memory space when processing information). This implies that software installation no longer requires writing complex compatibility scripts; instead, feeding a piece of instructional text to the agent suffices, and precise spellings are no longer the core competitive advantage.


Many existing application architectures are becoming "redundant" as AI is now capable of processing directly at the raw data layer. The speaker found that the menu generation app he painstakingly developed became meaningless as the model can now directly perform pixel-level rendering overlays on photos. This advocates for a profound change: AI should not only be used to expedite old business logic; we must realize that the disappearance of the middle layer means many traditional product forms have lost their physical foundation.


The capability of AI exhibits a "zigzag" pattern, as it only demonstrates superhuman intelligence in verifiable domains. While a model can refactor a hundred thousand lines of code, it may stumble on simple common-sense questions like "how many r's are in 'strawberry'." This is because the model is primarily reinforced through RL (reinforcement learning, a training method that guides model evolution using reward signals) in verifiable fields such as math and code. This reminds us that we must constantly observe within the loop, being wary of weaknesses outside the model's training distribution.


We are not building an "animal" with intrinsic motivation but rather "summoning a ghost" in the data distribution. The peak intelligence of a model depends on the training data distribution (such as adding a large amount of chess game data dramatically improving its playing ability), rather than generating some form of biological curiosity. This counterintuitive judgment points out that AI does not truly "understand"; it merely undergoes extreme reinforcement on specific circuits in a statistical simulation. Therefore, users must learn to identify and avoid false capabilities that lack data support.


Agentic engineering aims to maintain the quality threshold of professional software while harnessing AI's randomness. This novel engineering approach requires developers to ensure that the system remains free of security vulnerabilities when coordinating with unstable yet extremely powerful agents. It advocates for a new 10x engineer paradigm: the core of competition is no longer the speed of personally writing code but the ability to efficiently drive a large agent cluster like a director to deliver high-quality results.


Once machines take over trivial API details, human value will shift towards aesthetics and mastery of the "spec sheet." Developers will no longer need to memorize the specific interface parameters of PyTorch (a deep learning framework) as these details will be handled by AI "apprentices" with exceptional memory. This foreshadows a counterintuitive future: fundamental principles and design taste are more enduring than tool details, and humans should transition from being "bricklayers" to decision-makers defining "what constitutes good design."


"Thinking" can be outsourced, but "understanding" is the only bottleneck for humans in the age of cheap intelligence. Although AI can assist us in handling and recompiling massive amounts of information, it cannot decide for us "why to build this" and "whether it is valuable." This posits an ultimate conclusion: humans remain the sole commanders of the system because only human consciousness can imbue the intelligent processing with a goal, a global understanding that algorithms cannot replace.


If you could only remember one thing from this discussion, what would it be?


When machines can do all the work for you and even think about all the details, the only skill you cannot afford to lose is to understand what you truly want and whether you can tell if the machine is doing it right.


Greg Brockman: Human Attention is the New Bottleneck (OpenAI Co-Founder)


Speaker Greg Brockman is the co-founder and President of OpenAI. Former CTO of Stripe, he co-founded OpenAI with Sam Altman in 2015 and was the core architect of the company's technology and infrastructure. Within OpenAI, while Altman focuses externally (fundraising, public image, policy), Brockman focuses internally (technology, computing power, product). His hands-on coding and late-night deployment as an engineer are well known in Silicon Valley.


Intelligence has become a commoditized and standardized product, leading to an insatiable demand for computing power. OpenAI's business model essentially involves purchasing or leasing computing power, transforming it into intelligence through models, and then selling it at a premium. Due to the infinite demand for problem-solving, the supply of GPUs (Graphics Processing Units) in 2026 is predicted to be close to zero. The significance of this assessment is that AI is no longer just a software service but has evolved into a resource-based commodity business, where the supply of physical-world computing power directly determines the upper limit of civilization's intelligence.


The scaling law (an empirical rule stating that model capabilities increase with computing power) is a universal empirical truth, and there is currently no sight of any "ceiling." Despite the basic concept of neural networks originating in the 1940s, as long as massive computing power continues to be invested, the model's abilities will correspondingly and deterministically strengthen. This asserts a key point: technological stagnation will not occur in the short term as long as capital and electricity continue to be poured in, allowing us to attain greater intelligence. This provides the underlying logic for the aggressive investments of tech giants.


From a functional perspective, we have already completed 80% of the journey to AGI (Artificial General Intelligence) because models now have the ability to independently perform tasks in a closed loop. After a systems engineer hands over a complex optimization task to the model, the model not only completes the code writing but also autonomously runs a Profiler and undergoes multiple rounds of optimization based on feedback until the task is fully completed. This posits a counterintuitive view: AGI is not a future moment but an ongoing process, where AI has evolved from a "code-writing assistant" to a "problem-solving colleague."


Context (referring to the background information the model holds when handling specific tasks) is now replacing model algorithms as the most critical competitive edge. The new tool, Chronicle, is able to record a user's every action on the computer in real time, allowing AI to have "memory," thus saving time that humans would spend repeatedly explaining context to the machine. The importance of this assessment is that for entrepreneurs, one-time model training is no longer the sole moat; building a "data bundle" that allows AI to deeply understand the user's business environment is the truly enduring asset.


As the cost of "execution" approaches zero, Human attention will become the scarcest resource in the entire economy. When an Agent can work autonomously, even proactively report progress to their manager on Slack because the task is taking longer than expected, human energy will shift entirely from "doing things" to "judging if this aligns with my values". This judgment is very counterintuitive: the bottleneck is no longer machines not computing fast enough, but humans' speed of approval not keeping up with machine output, turning humans into the speed limiters of the system.


Traditional corporate organizational structures will be completely dismantled, giving rise to the era of the "individual enterprise" where one person rules over thousands of Agents. Individuals on the internet are now using top-tier models to solve mathematical problems that used to require an entire research team, signaling a shift in the core of competition from "piling up heads" to "unique entry points". This foreshadows a brand-new power structure: future companies may be extremely flat, where anyone with foresight can command a vast cluster of intelligent agents like a CEO managing a hundred thousand employees.


AI is transitioning from the digital world to the physical world, initiating a renaissance in scientific research. Recently, OpenAI's model derived a physical formula, providing key evidence for physicists searching for Quantum gravity, attempting to unify microscopic quantum mechanics with macroscopic general relativity. This assertion claims that AI is no longer just processing clean numerical symbols; it is learning how to handle the complexities and messiness of the real world, heralding an era where humans are on the brink of a scientific renaissance led or assisted by machines.


We will eventually bid farewell to this natural state of "submission" to machines and return to a goal-driven human-centered life. The human body was not designed to sit in front of a screen typing all day; future interactions will shift from inputting commands to expressing visions, enabling machines to act as servants to achieve our goals. This insight leads to a profound conclusion: the endgame of AGI is not to make humans more like machines but to have machines take on all non-human tasks, giving human time back to emotions and social interactions.


If you could remember only one thing from this conversation, what would it be?


When machines can do all the work for you, your only competitiveness and value will no longer be what you can do but rather what you truly want and whether you can judge if the machine is doing it right.


Demis Hassabis: The Three-Quarters Progress toward AGI (DeepMind CEO & 2024 Nobel Prize in Chemistry Laureate)


Speaker Demis Hassabis is the co-founder and CEO of Google DeepMind, who was awarded the Nobel Prize in Chemistry in 2024. As a young man, he was a chess master before transitioning to game design and earning a Ph.D. in cognitive neuroscience. Under DeepMind, achievements include AlphaGo (defeating Go world champion Lee Sedol), AlphaFold (solving the 50-year-old protein folding problem), and the Gemini large-scale model series, making him the only person today to lead a large AI lab and have won a Nobel Prize.


At the foundation of the universe lies "information," not matter or energy. The speaker argues that there is an equivalence between matter, energy, and information, and that information processing is the most fundamental perspective for understanding everything (especially for living organisms that counteract entropy). The importance of this assertion is that it elevates AI from a mere computer technology to a meta-tool for exploring the essence of reality, implying that building AI is a reconfiguration of human understanding of the logic of the universe.


AGI is a goal-oriented, step-by-step "twenty-year scientific engineering" effort. When DeepMind was founded in 2010, it established the vision of "first solve intelligence, then use intelligence to solve everything," and its current development fully aligns with that initial foresight. This breaks the misconception that the AI breakthrough was accidental and asserts that the arrival of AGI is the inevitable result of long-term scientific planning, rather than a Silicon Valley-style stroke of luck or hype.


The fusion of Deep Learning and Reinforcement Learning (a method where machines learn strategies autonomously through feedback rewards) is the definitive path to AGI. In the early years, the academic community compartmentalized these two approaches, but the speaker insists that this integration allows AI to learn universal logic from games without prior human knowledge. The assertion here is that by "synthesizing" the strengths of different technical domains, AI can leap from solving simple puzzle games to handling the infinite complexity of the real world.


AI will replace traditional mathematics and become the "underlying descriptive language" for complex emergent systems like biology. While mathematics can perfectly describe physical laws, it lacks Expressive Power when faced with systems like biology, which are filled with weak signals and messy data. This counterintuitive assertion points out that there is no need to insist on describing life with concise equations; AI can directly extract natural laws that humans cannot intuitively understand by simulating complex interactions.


The success of AlphaFold marks AI's achievement of a "paradigm-shifting transfer" in the field of life sciences. This tool has solved the 50-year-old challenge of protein folding that has plagued humanity, allowing drug development to potentially shift from the traditional Wet-lab (referring to lab experiments relying on chemical reagents and physical tests) model to digital simulation. This means that future drug development may no longer take 10 years, but could be shortened to days or even hours, liberating humanity from the laborious and inefficient process of biological trial and error.


High-precision simulators will transform social sciences into "hard science" that can be experimented on repeatedly. By learning world models to construct simulation environments, humans can sample economic policies or environmental energy issues thousands of times without disturbing reality. This proposes a counterintuitive future where decisions such as interest rate adjustments, originally fraught with uncertainty, will become predictably precise like engineering experiments, significantly reducing the risk cost of social governance.


Before discussing whether machines are conscious, they should first be developed into highly accurate "super research tools." The speaker advocates first using AGI as an "intelligent telescope" to introspectively observe and define human brain Consciousness and self-awareness. The significance of this judgment is that it sets a rational research priority, namely to first solve productivity bottlenecks and then use enhanced cognitive abilities to tackle the most profound philosophical questions of human civilization.


Humanity is in the final quarter of the AGI journey, with 2030 marking a milestone in civilization's evolution. From early chess games to now closing the loop in protein structure research, AI has demonstrated its ability to handle extremely complex and highly uncertain tasks. This advocates for an urgent judgment: the countdown to AGI's arrival has begun, and we are in the final sprint of a 20-year marathon. Society must prepare for a comprehensive transformation in the next five years.


If you could remember only one thing from this conversation, what would it be?


We are in the final sprint towards superintelligence. The ultimate goal of AI is not to mimic human conversation but to become the strongest scientific engine to help humans invent new drugs in days or unlock the secrets of the universe.


Programming and Organizational Transformation


Anthropic's Boris Cherny: Programming Solved, Next Level is Organization


Speaker Boris Cherny is the creator of Claude Code under Anthropic. Claude Code is a command-line programming tool released in 2025 and is considered one of the most powerful AI programming assistants by the developer community, key to igniting the concept of "agentic engineering."


The biggest obstacle in current software development is the "Product Overhang," where the UI interface lags behind the model's capabilities. Past code assistants could only do simple single-line completions, but current models are fully capable of taking over the entire end-to-end development task. This means developers must transition from "fixing old interfaces" to building agentic (embodied intelligence, where the model can autonomously perform multi-step tasks and perceive the environment) new products; otherwise, humans will not unleash AI's true productivity potential.


For top developers, the era of manually writing code, the "craftsman era," has come to an end. The speaker achieved 100% code generation by the model through Claude Code, setting a personal record of completing 150 PRs (Pull Requests) in a single day. This leads to a counterintuitive conclusion: AI is no longer your "co-pilot" but a main driver capable of independently delivering results, transforming humans in engineering from "bricklayers" into "project reviewers."


In the era of AI explosion, the key to success is to develop products for the "next-generation model" rather than to accommodate the status quo. Claude Code did not achieve PMF (Product Market Fit) in the first six months after launch, until the more powerful Opus 4 model was released, transforming the product experience. This demonstrates that entrepreneurs must anticipate and wait for a leap in intelligence, as this "model capability mutation" can instantly turn an ordinary tool into a industry-shaping weapon.


Loop (referring to letting the model run autonomously on a schedule and provide feedback) will replace dialog boxes, becoming the ultimate paradigm of human-machine collaboration. Models can now use cron (a system tool for scheduling recurring tasks) to autonomously schedule repetitive tasks, such as automatically fixing test errors every 30 minutes, completing code refactoring, or organizing user feedback. This means that future workflows will no longer depend on humans constantly staring at screens and giving instructions, but will establish a 24/7 self-running, unsupervised digital team of experts.


AI is eliminating the barrier of a single technology stack, giving rise to interdisciplinary "super generalist talents." Within the Anthropic team, whether it is the CFO, designer, or researcher, everyone is using agents for professional programming development. This foreshadows a shift in the professional paradigm: mastery of a specific programming language's "technical depth" will rapidly depreciate, while possessing "cross-disciplinary breadth" with product insight, design aesthetics, and industry knowledge will become the most core and scarce resource in the future.


Traditional software's business moat is facing a complete collapse due to AI's "agency." Models now possess strong hill climb capability (iteratively self-improving until a goal is reached), able to autonomously understand and execute any complex business process. This implies the arrival of the "SAS Revelation": software surviving solely on process automation will lose value, as AI can generate customized alternatives for everyone at any time based on the user's goals.


Coding is experiencing its "printing press moment," transitioning from an elite skill to a widely accessible "literacy" ability. Just as the advent of the printing press in the 15th century increased literacy rates from 10% to 70%, AI will make programming as simple and natural as sending a text message. This advocates a profound point: the best candidates for writing financial software in the future will no longer be programmers, but accountants who understand business logic the best, as this "domain knowledge" is the most thorough democratization of power.


The true long-term competitive advantage of an enterprise is no longer the model version, but the degree of "AI nativeness" in its organizational structure. Internally, Anthropic has already achieved mutual communication and autonomous collaboration between different Agents on Slack, completely abandoning the old organizational processes of manual code writing. This reveals a harsh truth: the gap between you and the frontrunners lies not in whether you have a model, but in whether you are willing to completely start over to adapt to the speed of AI, restructuring the company's operational logic.


If you can only remember one thing from this conversation, what would it be?


In the future, coding will become as simple as sending a text message, and everyone will be able to easily create an app. At that time, the most valuable thing will no longer be whether you can code or not, but whether you truly understand the industry.


The Physical World and Interfaces


Jim Fan of NVIDIA: The Endgame of Robotics


Speaker Jim Fan is a senior researcher at NVIDIA and the head of the robot AI project (Project GR00T). An early member of OpenAI, a Stanford Fei-Fei Li lab Ph.D., he is one of the most renowned researchers in the field of robotic fundamental models. Active on Twitter, he is often seen as the robotic version of "Karpathy" — conducting research while also being an industry evangelist.


Robotics must "copy LLM's homework" and take the next frame prediction of the physical world as the core logic of evolution. Just as language models have mastered human thinking by predicting the next token (text fragment), robots should also learn the laws of reality by predicting the physical world state. The argument behind this judgment is that we should no longer handwrite rules for robots but rather treat it as a generative problem, allowing the robot to spontaneously develop intelligence through "simulating the evolution of the physical world."


We must replace the existing "top-heavy" visual language model with the WAM (World Action Model). Current VLMs (Visual Language Models) excel at understanding nouns and knowledge but lack intuitive understanding of physical laws and verbs (such as moving a cup). The importance of this judgment lies in WAM treating vision and action as "first-class citizens," enabling robots to have the ability to "anticipate the future seconds and act accordingly," thus solving complex tasks never seen during training.


Large-scale video pre-training is actually a cheap substitute for some kind of "physical simulator." In the process of predicting massive video pixels, the model spontaneously learns complex physical properties such as gravity, buoyancy, light reflection, without any manual programming. This proposes a counterintuitive conclusion: we do not need precise physical equations; we just need to let AI watch enough "video slop," and it can subconsciously build real physical intuition.


Teleoperation (remote operation, where a person wears a device to manually control a robot) is becoming the biggest obstacle to scaling up robot production. Due to the limitations of human experts' physical endurance, this expensive and painful data collection method has a hard limit of "24 hours per robot per day." The argument behind this judgment is that we must break through the bottleneck by using Sensorized human data to allow robots to learn directly from human daily behavior, rather than relying on expensive "hands-on teaching."


Robot dexterity also follows the Scaling Law, where intelligence depends on the hours of pre-training. Research has found a clear logarithmic relationship between a robot's task success rate and the duration of first-person view video training input. The significance of this finding is that it demonstrates that "robot intelligence" is no longer an unquantifiable black box but rather an expected function of compute power and data, achieving an exponential leap in capability through the input of tens of millions of hours of video data.


Future training environments will shift from classical physics engines to data-driven "neural simulators." Traditional simulators require manual modeling, while technologies like Dream Dojo can generate sensory states directly based on action signals, achieving "computation as environment." This means that we no longer need to build a million entity laboratories; instead, through powerful inference compute, AI can engage in tens of millions of parallel reinforcement learning iterations in its "dreamscape," significantly reducing R&D costs.


Through the Physical API, robots will be able to be commanded and configured like software applications. Future factories will evolve into "Lights-out factories," where inputting a Markdown file describing a product design will enable a robot cluster to autonomously coordinate and print physical products at the atomic level. This proposition foretells a counterintuitive future: hardware manufacturing will no longer be heavy asset-heavy industry but rather a standardized service that can be flexibly scheduled by software.


By 2040, we will witness physical-level self-research, where robots will embark on the ultimate process of "self-iteration." When robots can autonomously design, improve, and manufacture the next generation of robots, humans as a bottleneck in technological evolution will disappear entirely. The assertion of this proposition lies in considering the exponential nature of technological development: we are at the final stop of the roboticist "civilizational evolution tree," and this leap will be faster and more intense than the evolution from cat/dog recognition to AGI.


If you could remember one thing from this conversation, what would it be?


Previously, robots needed hands-on training from humans; in the future, they will only need to watch thousands of hours of humans working in videos to learn all the complex skills and begin manufacturing themselves.


Waymo CEO Dmitri Dolgov: The 20 Million-Ride Journey of Autonomous Driving


Speaker Dmitri Dolgov is the Co-CEO of Waymo and a tech-first founder. An American of Russian descent, he was a core member at Stanford University in the DARPA Grand Challenge (an early autonomous driving competition organized by the U.S. Department of Defense). He joined the Google self-driving project in 2009 and has been the chief designer of Waymo's technology roadmap. Through the ups and downs of the two-decade autonomous driving industry, he is one of the few elders who have persisted from day one to a scale of 20 million rides.


Waymo is Alphabet's (Google's parent company) self-driving company, started as a secret Google project in 2009 and became independent in 2016. It is currently the only company globally operating a fleet of Robotaxis at scale in multiple cities without safety drivers, with over 20 million autonomous miles driven. Its technological approach differs from Tesla's – emphasizing LiDAR + HD mapping + modular architecture.


The self-driving industry faces the fallacy of "easy to learn, hard to master," where early explosive growth often masks the brutality of long-tail challenges. Many teams become overly optimistic after initial technical breakthroughs, but Dmitri believes that this "sweet then bitter" characteristic makes transforming technology into a truly safe, superhuman product extremely difficult. The crux of this judgment lies in: AI's real-world deployment threshold isn't the first 90% of functional demos, but the ability to persevere in the remaining 10% of complex edge cases, which is the fundamental reason most competitors disappear.


In an area involving human lives, "safety" must be an inviolable foundational belief, not a negotiable feature. Globally, someone dies in a car accident every 26 seconds, prompting Waymo to consider safety as a non-negotiable foundation from day one of architecture design. This advocates for a counterintuitive conclusion: in Silicon Valley's culture of speed and disruption, only those "patient" companies that establish an extremely high safety threshold can survive the technology disillusionment phase and ultimately earn public trust.


A mere end-to-end learning architecture is insufficient to meet extreme safety demands; structured representations must be introduced for "enhancement." While Waymo also uses E2E (End-to-End, referring to a single model directly from sensor input to decision output), they add a structured intermediate representation layer to achieve real-time validation at runtime. The importance of this judgment lies in: it breaks the blind worship of "bigger models are better," advocating for ensuring AI decision interpretability through architectural rigor to achieve superhuman safety.


A true AI driving system should be a closed-loop ecosystem integrating driving, simulation, and evaluation. Waymo's Foundation Model simultaneously drives the three core pillars of the driver, simulator, and evaluator, enabling the system to understand the dynamic principles of the physical world. This advocates a core viewpoint: AI's evolution should not solely rely on external road testing but should achieve "self-evolution" through internal physical simulation, exhausting all extreme scenarios unseen by humans in virtual space.


AI can demonstrate a "foresight" ability surpassing human perception by capturing faint physical signals. Waymo once captured an extremely weak foot reflection under a bus using LiDAR (Light Detection and Ranging, a sensor that uses laser pulses to measure object distances), thus preemptively anticipating and avoiding pedestrians outside the line of sight. This counterintuitive judgment proves that AI is not merely mimicking human driver intuition but using a perception dimension that transcends human physical limits, constructing a "perspective-like" God's eye view to ensure safety.


The autonomous driving technology has completed the leap from "lab to infrastructure," entering an exponential expansion of its commercial ecosystem. While it took Waymo 8 years to offer services in 4 cities, recently it was able to launch in 4 new cities in just 1 day, with the number of orders doubling in 7 months, surpassing 20 million rides. This indicates that the technology has achieved high levels of universality, no longer requiring extensive tuning for each new city, and autonomous driving is now rapidly replicating across regions like a software update.


Once the AI completely solves the "driving" task, the ultimate competition for cars will be the passenger's spatial experience. The 6th generation Waymo hardware is entirely designed around the passenger experience, abandoning the driver-centric layout to create a "mobile living room" with automatic sliding doors. This proposition advocates for a fundamental transformation of the business logic: future cars are no longer mere driving tools but physical containers of service, where the core value will shift from "how to get there" to "how to spend time on the road."


The social dividend brought by AI should be measured by the "hard metric" of saving lives as the ultimate yardstick. Data shows that Waymo's safety is 13 times higher than human drivers, meaning that through scaled operations, it can save an additional life every 8 days in a severe accident. The counterintuitive aspect of this view is that while we often focus on the convenience brought by AI, its true value leverage lies in surpassing human stability and effectively offsetting the fatal weaknesses of human drivers.


If there is one thing to remember from this conversation, what is it?


Self-driving cars are now 13 times safer than human-driven cars, and they are rapidly expanding into more cities, rendering driving a thing of the past.


ElevenLabs Founder: Voice Becomes the Primary Interface for AI


ElevenLabs is the world's most renowned AI voice synthesis company, founded in 2022 by two Polish individuals, Mati Staniszewski (former Palantir strategist) and Piotr Dabkowski (former Google ML engineer). The entrepreneurial inspiration comes from Poland's tradition of dubbing movies with the same male voice for all characters. Its voice cloning and emotionally expressive speech synthesis technology currently lead the industry and are widely used in audiobooks, podcasts, and cross-language translations. The most famous demonstration is the video of Argentinian President Mila where the voice remained consistent across languages. Valued at approximately $3.3 billion by 2026.


Audio was a long-overlooked AI niche track, allowing for rapid advancement at relatively low computational costs by deepening roots in a niche field. During the 2022 big model showdown, most focused on text or vision, while audio's compute demands were relatively lower, enabling startups to grow independently. The proposition of this assessment is that entrepreneurs do not need to get involved in a costly arms race of computational power. By pinpointing vertical domains where the technological threshold has not been leveled by big players, they can establish an early advantage with high research and development efficiency.


Emotions and non-verbal cues (such as laughter and pauses) are key to breaking the "uncanny valley effect," rather than just textual translation. ElevenLabs recreates breathing frequency and natural laughter to transition the model from mechanical narration to human-like expression. The importance of this insight lies in: the essence of sound is an emotional carrier, merely replicating tone can only address the "likeness" issue, only by replicating those interaction logics that humans cannot intuitively describe can true trust between machines and humans be established.


The evolution endpoint of an Agent is to have "emotional intelligence," being able to adjust communication strategies in real-time based on the interlocutor's state. Researchers are developing an interactive model that can identify user stress and provide a reassuring tone of voice, enabling machines to learn to match the interlocutor's pace and emotions. This advocates for a counterintuitive shift: voice interaction is no longer about cold command execution, but about a psychological resonance, meaning that future voice AI will have a more stable empathy than humans to handle extreme conflicts.


Audio General Intelligence will bridge the gap between speech and music, achieving seamless transition in a full-modal audio stream. An ideal model should be able to transition naturally from reading aloud to singing in a continuous stream, while maintaining tonal and personality consistency. This advocates for a technological leap: audio is no longer a disparate toolbox but a unified creative engine, this continuity will fundamentally change the production paradigm of podcasts, film post-production, and immersive entertainment.


Voice agents are transitioning from a "cost-saving tool" to a "revenue-generating tool," directly reshaping a company's revenue growth curve. Companies like Deliveroo have already used voice agents to automatically contact restaurants and uncover potential business opportunities in inbound sales calls. This asserts that the commercial value of voice AI is no longer about replacing customer service to cut costs, but rather about driving business growth through 24/7 uninterrupted proactive communication and data analysis, becoming a sales pioneer.


Voice will become the "main gateway" for intelligent human connection to everything, especially in a future where humanoid robots are widespread. As robots and various smart devices surround humans, voice is the most natural way of instruction and interaction. The importance of this insight lies in: voice is not a supplement to screen interaction, but the true bottleneck to complex intelligence, mastering the voice interface is like mastering the ultimate remote control to control the physical world.


The future core efficiency of an enterprise depends on whether it can embed engineering resources in non-technical teams such as legal and finance. ElevenLabs, with only 400 people, insists on assigning dedicated engineers to the legal and operations teams to develop automation systems. This advocates for an organizational transformation: in the AI era, non-technical personnel must also learn to vbe coding (using AI tools to quickly write code) to solve mundane tasks, while engineers are responsible for connecting these scattered automations into robust business systems.


In a future oversaturated with AI, identity verification will be scarcer than content generation, and trust will shift from the voice itself to secure certificates. When anyone can perfectly replicate a voice, we will need a watermark mechanism to prove that the person on the other end is really you. This leads to a counterintuitive conclusion: we no longer need to painstakingly distinguish AI but rather need a set of certification standards for "trustworthy AI." The most valuable aspect in the future will no longer be your voice but rather your authorization credentials for that voice.


If you could remember only one thing from this conversation, what would it be?


In the future, the authenticity of the speaking voice will no longer be crucial. What matters is whether you can prove that the AI making that restaurant reservation or attending the meeting truly represents you.


Security Frontier


XBOW: The Rise of Autonomous AI Hackers


XBOW is an AI cybersecurity startup that has developed an AI Agent capable of autonomously discovering and exploiting vulnerabilities. In August 2024, XBOW's Agent topped the leaderboard on the world's largest white-hat hacker platform, HackerOne, marking the first instance of AI surpassing top human hackers in live combat. Its "Alloy" strategy—alternatingly invoking different models like Claude and Gemini at each step of an attack—is a hallmark engineering practice in the field.


Cybersecurity has evolved from a "human skills duel" to a "system optimization competition," and traditional defense models are facing a devastating blow. In 1575, Japan's Oda Nobunaga systematically defeated seemingly invincible samurai cavalry with a gun-line tactic, much like today's AI systems are deconstructing old defenses reliant on human expertise. The assertion is that the nature of security competition has fundamentally shifted; it is no longer about who has the most brilliant hacker but about who can be the first to fully automate the defense system with AI.


Even the most tightly guarded top-tier systems are rendered defenseless in the face of inexpensive and efficient autonomous AI. With just a URL, XBOW's AI managed to breach Microsoft Bing's most critical vulnerability, RCE (Remote Code Execution), at a cost of $3000. This underscores a counterintuitive truth: even fortresses honed by global hackers become mere illusions when faced with tireless AI that can automatically conduct reconnaissance and prioritize tasks.


AI now possesses operational capabilities that surpass those of the world's top human hackers, rather than merely serving as an auxiliary tool. On HackerOne, a crowdsourced security testing platform connecting businesses with security researchers, XBOW's bot achieved the top global ranking solely through black-box testing (attacking without knowledge of the internal code). This shatters the myth that machines cannot handle complex, creative attacks and proves that AI has evolved from being a mere assistant providing suggestions to an "autonomous warrior" capable of independently delivering attack outcomes.


Through the "Alloy Model" strategy, AI is able to achieve an evolutionary effect where 1+1>2 through self-correction. XBOW alternates between different models such as Gemini and Sonnet in each step of the attack action (Alloy mode), leveraging the differences between the models to compensate for each other's logical errors. The significance of this approach lies in the fact that the path to the most powerful hacker AI doesn't necessarily have to wait for the emergence of a single perfect model. By orchestrating existing models through a sound engineering architecture, the collective destructive power can far exceed that of a single model.


The real security threat comes from "exploitable real-world impact" rather than theoretical vulnerabilities discovered through code audits. Traditional white-box testing (analysis conducted with source code access) often only lists numerous vulnerabilities without being able to determine if they can actually be exploited by malicious actors. In contrast, autonomous AI can provide clear answers through real-world simulation. This advocates for a key transformation: defenders must stop struggling in a sea of "false vulnerability reports" and instead focus on those critical points that can truly lead to server takeover.


The survival window for vulnerability patches has been completely closed, with attack activities now occurring before vulnerabilities become public knowledge. Previously, there was a two-year lag from when a vulnerability was listed in the CVE (Common Vulnerabilities and Exposures) to when it was exploited. Today, this timeframe has turned into a "negative" number, meaning vulnerabilities are being exploited in bulk by AI before official confirmation. This leads to an urgent conclusion: defense strategies relying on "patch waiting" have collapsed, and proactive automated defense has become the only way out.


The rise of AI is not the end of the cybersecurity industry but a radical reshaping of its defense value. Faced with AI-driven automated attacks, the logic-defying decline of traditional cybersecurity is unjustified, as society now more than ever needs AI-driven defense mechanisms to counter AI attacks. This assertion argues that we are in a survival arms race, where the only antidote is to empower human researchers with stronger AI to identify all vulnerabilities before malicious actors act.


Society has less than a year left to patch up the global digital infrastructure, or else face catastrophic consequences. Due to advancements in Open-weight models (AI models with disclosed weights that can run locally), the most formidable autonomous hacker capabilities will be globally available within 6 to 9 months. This counterintuitive assertion serves as a final ultimatum: if automated defense is not achieved within this extremely short window, the global internet system will soon face an unprecedented security winter.


If you could only remember one thing from this conversation, what would it be?


The current AI can automatically breach top websites like the world's number one hacker, and this capability will soon become widespread in less than a year. If you don't quickly use AI to automatically patch vulnerabilities, your system will soon be completely compromised.


Compute Power and Hardware Edge Betting


Recursive Intelligence: The Automation Revolution of AI Chip Design


Recursive Intelligence is an AI chip design company founded by Anna Goldie and Azalia Mirhoseini. The two previously co-invented AlphaChip at Google Brain—a system that uses reinforcement learning to automatically design chip layouts, which has been applied to Google's fourth-generation TPU and Pixel phone chips. They are trying to turn "AI chip design" into an industry-level transformation akin to TSMC initiating the fabless era—introducing the "Designless" concept, allowing customers to simply submit workload requirements, and the platform automatically generates a manufacturable chip design.


Human experts have become a drag on chip iteration. The current physical design and logic validation each take a year and involve thousands of experts, leading to significant commercial losses. In the current moment where every day of delay in the NVIDIA Blackwell chip means a $225 million opportunity cost, traditional design patterns relying on human expertise are becoming the biggest obstacle to AI progress.


An AI-driven recursive evolution loop at the physical level must be initiated. By optimizing chip design with AI, then using a stronger chip to train stronger models, the complete decoupling of software and hardware will be achieved. The proposition of this "recursive self-improvement" is that chips should not just be static fuel but should become dynamic execution endpoints in the AI evolutionary chain that can automatically adjust based on software feedback.


AIs have shown "superhuman" capabilities in complex layout tasks. The chip layouts generated by the Alpha Chip age