Sam Altman's Latest Interview: Why Did OpenAI Break Up with Microsoft?

Bitsfull2026/04/29 14:1011004

概要:

After Microsoft Azure Exclusivity Ends, OpenAI Aims to Bring AI Agent to Enterprise Workflows


Editor's Note: On April 27, OpenAI and Microsoft just amended their partnership agreement, with Azure no longer exclusively hosting OpenAI models, allowing OpenAI to expand its products to AWS and other cloud platforms.



Externally, this may seem like a simple change in cloud service distribution channels; however, from the discussion between Sam Altman and AWS CEO Matt Garman, the more critical change is that AI is moving from "model inference" to "enterprise workflows."


This article is a translation of a technology business analysis from Stratechery on the interview with Sam Altman and Matt Garman, focusing on the launch of Bedrock Managed Agents by OpenAI and AWS. It discusses the similarities between cloud computing and AI platform migration, the challenges of deploying enterprise-level agents, the differences between AgentCore and managed services, and AWS's position in the AI infrastructure competition.



The core of Bedrock Managed Agents is not only to enable AWS customers to use OpenAI models but to embed the models into AWS's native identity, permission, logging, governance, deployment, and security systems. In other words, what enterprises truly need is not just a smarter chatbot but a set of "virtual colleague" systems that can run internally within the organization, access data, perform tasks, and adhere to permission boundaries.


This is also the most noteworthy aspect of this collaboration: the focus of the AI competition is shifting from "who has the strongest model" to "who can turn the model into usable enterprise infrastructure." In a personal developer scenario, Codex can rely on the local environment to solve many complex problems; but in an enterprise scenario, the agent must deal with databases, SaaS, permission systems, security boundaries, and compliance requirements.


In a sense, this collaboration is also a replay of the early logic of cloud computing. AWS lowered the startup costs for startups, allowing small teams to build internet products without having to self-host servers; now, OpenAI and AWS are trying to lower the barrier for enterprises to deploy AI agents, enabling companies to integrate AI into real business processes without having to stitch together models, permissions, data, and security systems. The difference this time is that the pace is faster, and enterprise demands are more urgent.


Therefore, this article is not really about OpenAI "listing" its models on AWS, but about AI infrastructure entering the next phase: models, cloud, data, and enterprise permission systems are beginning to be deeply integrated. The future competition may not only be about API prices, chip performance, or model rankings, but about who can build an AI platform that allows enterprises to use it with confidence, scale continuously, and truly perform work.


Below is the original text:


Introduction


Good morning, as I mentioned yesterday, today's Stratechery interview is a bit ahead of my usual schedule (from Thursday to Tuesday); however, the actual delivery time is a bit delayed (from 6 a.m. ET to 1 p.m. ET) due to an embargo on the topic.


Over the past few days, this embargo has put me in a somewhat delicate situation: last Friday, I interviewed OpenAI CEO Sam Altman and AWS CEO Matt Garman on the topic of Bedrock Managed Agents supported by OpenAI. Naturally, one question I raised was: how does this collaboration align with the agreement between OpenAI and Microsoft, where Azure exclusively gained access to OpenAI's models?



On Sunday evening, I heard through the grapevine that Microsoft was going to announce something on Monday morning. At that moment, I was wondering if it would be a preemptive lawsuit!


Come Monday, Microsoft and OpenAI announced that they had revised their agreement to allow OpenAI to offer its products on other cloud providers, including AWS.


And that brings us to this interview today.


I believe this new arrangement between Microsoft and OpenAI makes sense for both parties. Here are the key points of the new agreement as outlined in Microsoft's official post:


· Microsoft remains OpenAI's primary cloud partner; OpenAI products will be first available on Azure unless Microsoft cannot support or chooses not to support certain necessary capabilities. OpenAI can now offer its full suite of products through any cloud provider.

· Microsoft will continue to have a license for OpenAI's models and related IP until 2032, but Microsoft's license will no longer be exclusive.

· Microsoft will no longer pay revenue share to OpenAI.

· OpenAI's revenue share to Microsoft will continue until 2030, unaffected by OpenAI's technical progress, with a fixed rate and a total cap.

· As the primary shareholder, Microsoft will remain directly involved in OpenAI's growth.


I believe the most crucial point is the last one. Previously, Azure had a real competitive edge due to being the only cloud provider capable of offering OpenAI's models at scale. However, this exclusivity also constrained OpenAI, especially as more enterprises are increasingly concerned about accessing models on their existing cloud platforms. I have emphasized multiple times that this is a significant competitive advantage of Anthropic. In other words, Azure's exclusivity is actually hindering Microsoft's investment in OpenAI. Considering Anthropic's rapid growth this year, Microsoft must nurture this investment, even if it means diminishing Azure's differentiation advantage.


Meanwhile, OpenAI clearly sees AWS as a massive opportunity—big enough that it is willing to forgo a portion of its Azure-related revenue in the coming years. When combined with the previous point, this also makes it easier for Azure's management to accept losing exclusivity: after all, without the revenue share to OpenAI, Azure's balance sheet will look much better. OpenAI also frees Microsoft from the AGI clause; now, whatever happens, the agreement between the two companies will persist until 2032.


It now seems quite clear that OpenAI's next focus will be on AWS. And the most compelling evidence is, in fact, the topic of this interview: Bedrock Managed Agents supported by OpenAI. The easiest way to understand this product is to see it as Codex within AWS. Codex has been able to perform well largely because it is localized, allowing many complex issues, especially security, to be naturally addressed. However, enabling agents to operate across departments and systems within an organization is a whole different ball game. The goal of this product is to make it easier for organizations that already have most of their data on AWS to use these types of workflows.


Building on this point, in this interview, we discussed how AWS pioneered the entire cloud computing category and its impact on startups; we also explored the similarities and differences between AI and that earlier paradigm shift. Subsequently, we delved into Bedrock Managed Agents: what it is, and how it differs from Amazon's existing AgentCore product. We also talked about Trainium, why the chip is not that critical for most AI users, and why collaboration is a sensible choice compared to Google's emphasis on full-stack integration.


A reminder: all Stratechery content, including interviews, is available on the podcast; click the link at the top of this email to add Stratechery to your podcast player.


Enter the interview.


Interview Transcript


This interview has been lightly edited for clarity.


OpenAI Joins AWS, Ending Azure's Monopoly Era


Ben Thompson (Host): Matt Garman, Sam Altman—Matt, welcome to Stratechery; Sam, welcome back. I previously interviewed Altman in October 2025, March 2025, and February 2023.


Sam Altman (OpenAI CEO): Thank you.


Matt Garman (AWS CEO): Thank you, happy to be here.


Host: Matt, this is your first time on Stratechery. Unfortunately, I think Sam's presence will prevent us from following the usual "guest introduction" segment. Besides, he probably doesn't want to hear us reminiscing about our days at Kellogg School of Management. However, it's great to have a fellow alum on the podcast.


Matt Garman: Yes, I'm thrilled to be here. Next time I can come again, and we can delve deeper into our conversation.


Host: That's great to hear. You've been involved with AWS since your internship days, and now you're leading the entire AWS organization amidst the AI wave. In your opinion, what are the similarities and, let's say for now, differences in building an AI business compared to building the initial general computing business back then?


Matt Garman: I think the similarities lie in seeing the same excitement and witnessing builders outside being able to do things they couldn't do before. When we started AWS, a cool aspect was that developers suddenly could access infrastructure previously only available to the largest companies. In the past, only companies with multimillion-dollar budgets to build data centers could have such capabilities. Now, developers just need a credit card, a few dollars, to launch an app. This significantly expanded what Internet builders could accomplish.


Our idea back then was that people could build anything they wanted. We wouldn't presuppose what they should do. We believed creativity exists worldwide; as long as you put powerful tools in front of them, they would create interesting and amazing things.


I believe AI's empowerment of builders is equally transformative, if not more so. Think about what is now possible: you don't have to study programming for ten years to build an app; you don't need to have a massive team of hundreds of people, or spend months and months on end to make something. You can rapidly build and iterate with a small team. AI is unlocking innovation across various fields. In many ways, this is very similar to back then. Seeing the capability it brings to the customer base is truly very exciting.


Host: However, when AWS emerged back then, you were the only player, so in a sense, whether it was good or bad, some of the benefits naturally fell on you. Did you feel a sense of: in the AWS era, many things were about general computing—making computing replaceable, elastic, and cheap; but in the AI field, especially in the training stage, the winning abstraction seems more like highly vertically integrated super clusters, extremely advanced networks, and very tight coupling between software and hardware. Was this unexpected for you? Because this time, you weren't starting from scratch, nor were you the "only ones here"; you had a specific understanding of large-scale computing in the past, but at least in the initial years of AI, it didn't seem to align perfectly.


Matt Garman: I'm not sure how different this is for us. I think what's truly different is the astonishing speed at which this is happening. I think that might catch everyone by surprise. Sam, feel free to chime in if you disagree. But the speed at which people are embracing these capabilities, and the speed at which they are seizing upon these capabilities, I think has surpassed everyone's expectations.


This is very different from when we first started with cloud computing. Back then, we spent a very long time explaining why a book-selling company would venture into providing computing power. We had to put in a lot of effort to explain what cloud computing was. There was a lot of heavy lifting involved, which people often forget now. But in 2006, no one took it for granted that the world's computing would move to the cloud. There was indeed a lot of hard explaining and pushing involved back then.


Host: So do you think there is a need for some explaining now as well? Because many people were initially anchored in the training era, and you guys would say, "We are thinking about the inference era," which is a different thing. Do you also need to reactivate that explanatory capability?


Matt Garman: Yes, it is needed, but the speed at which people understand what you are talking about is completely different now. So I think, yes, when you have to take people from "this looks cool, I can chat with an AI-powered chatbot" to "it can actually get work done in your enterprise," there is indeed some educational process needed. But in terms of technological evolution speed, this process has been relatively fast.


Host: I promise we will soon get to today's product theme. But Sam, from the perspective of the startup ecosystem, looking back, AWS was clearly transformative; it completely changed the startup barrier. Now anyone can start a business. Seed rounds, angel investors have emerged, and the funding threshold has been pushed back. You don't need to write "we need to buy servers" in your PPT; you can first build an app and then go for Series A or other rounds.


So, from your perspective, what are the differences between the world opened up by AWS back then and the world opened up by AI today? And what are the similarities?


Sam Altman: I believe there have been four platform moments in history that have massively empowered startup companies: the Internet, the cloud, mobile, and AI. Of these four moments, the first one I experienced as an adult was the cloud. In the early days of YC, it's hard to overstate how much of a change this was for startup companies.


Prior to that, startup companies had to rent colocation space, assemble their own servers, and rack the equipment. It was an incredibly complicated thing, and you had to raise a lot of money first. And then suddenly, the cloud emerged. Even though the cloud came after YC was founded, it was around the second year.


Host: I was just about to ask this—ultimately, is YC more intertwined with the cloud than you realized at the time?


Sam Altman: At the time, we felt they were highly intertwined. YC felt like it was riding the wave of the cloud from the beginning because there were some early examples of cloud services even before AWS.


Host: With AWS in existence, the amount of funding needed to get a startup off the ground was indeed much less than before.


Sam Altman: It was a massive empowering shift, which is also why YC sounded so crazy at that time. People would say, "It's impossible to invest a few tens of thousands of dollars in a startup; it's just not feasible, the server costs alone exceed that amount." So, this completely changed what startups could accomplish with a small amount of capital.


Generally, when there is a major platform shift, and you can do things in a faster cycle and with less capital than before, startups win. This is the classic way startups defeat big companies. In the early days of my career, I witnessed firsthand the kind of change the cloud brought. Now, watching companies build products based on AI feels very similar in direction. But, as Matt said, the pace this time is incredibly frantic.


Host: Is there a situation where existing large companies, these industry giants, are adopting AI much faster than they adopted cloud computing back in the day?


Sam Altman: There are certainly more instances of this. But what I'm talking about also includes the speed of revenue growth for startups. Recently, during my YC lecture, I asked at the end, "What are the revenue expectations for a good company when they finish YC now?" They said, "The answer changes every month. Maybe in the same YC batch, the answer will be different at the beginning and at the end." This has never happened before. The speed at which people are building scalable businesses on this new platform is something I have never seen before.


Host: Matt, throughout the entire cloud era, AWS was basically the preferred cloud for all startups, giving you a huge advantage. So today, what still makes you the preferred cloud? Because now, many are building products on the OpenAI API; or do you actually feel, "We entered this market from a very different perspective. We have a huge existing customer base, all of whom are asking for AI capabilities, but in terms of the entire startup cohort that Sam mentioned, our visibility is not as high"?


Matt Garman: I think there are several aspects here. First, we are very excited about this partnership, and I also believe it will be very significant for many startups. But even today, if you talk to startups, most of the expanding startups are still expanding on AWS, and there are many reasons behind this. The scale is there, the availability is there, the security is there, the reliability is there, the ecosystem of other ISV partners is on AWS, and customers are also on AWS.


Host: (laughs) Whether they like it or not, everyone has used the AWS console, so they are also used to it.


Matt Garman: And we will help them. We have spent a lot of time empowering startups, not just by giving credits, but also by advising them on how to build systems, how to think about go-to-market, and many similar things. I think many startups really appreciate these efforts. We have invested a significant amount of time and energy to ensure this because we truly believe that startups are the lifeblood of AWS. It has been this way from the beginning, as Sam just mentioned, and it is still true today. I still go to Silicon Valley or elsewhere every quarter, directly meet with startups, listen to what they are doing, and confirm whether what we are building can really meet their needs.


So, the competition for startup attention today is indeed more intense than 20 years ago. But this is still as important to us as it was in the past. We invest a lot of time to ensure that we can meet the needs of these startups.


Host: Can we say that? Are those who build products directly based on the OpenAI API, rather than using the Azure version of the OpenAI service, more likely to adopt this kind of tech stack: regular computing on AWS, and the AI part using OpenAI?


Matt Garman: I think this is a very common pattern for many startups today, absolutely.


Bedrock Managed Agents: Bringing AI Agent into Enterprise Workflows


Host: That brings us to today's announcement: Bedrock Managed Agents supported by OpenAI. I think I got that right. From my understanding, the selling point of this product is not just that the OpenAI models can be used on AWS—which I think is not allowed yet—but that OpenAI's cutting-edge models are encapsulated in an AWS-native agent runtime, including identity, permission status, logging, governance, and deployment. Sam, is that an accurate portrayal?


Sam Altman: Yes, that's a good summary.


Host: Thank you. So what exactly is this? Now, please explain it in plain language.


Sam Altman: I think the next stage of AI will move from "you give the agent some text, and then you get back more text," even from "you give it a bunch of code, and then get back more code," to a new stage: these agents will run inside companies to perform various types of work.


"Virtual colleagues" is the best way I've heard it described, but no one has really found the most accurate language to describe it yet. We are collectively building a new product to help companies that want to build these stateful agents, make them real, and make them usable. Again, I don't think we yet know how the world will ultimately talk about these agents or use them. But if you look at what Codex is doing, I think that is a good example that can show us where all of this is heading.


Host: To truly get an AI agent up and running, having just the model is not enough. It needs a whole set of supporting systems: a runtime environment, callable tools, task states, memory, permission management, and performance evaluation. You mentioned "state" specifically earlier. How crucial is this infrastructure outside of the models for whether the agent can truly function?


Sam Altman: Its importance cannot be overstated. I no longer see the supporting system (Harness) and the model as completely separable things. From my own user experience, when I initiate a task in Codex and it does something amazing for me, one thing is very clear: I don't always know how much credit should go to...


Host: Is it the model that's strong, or is it the supporting system (Harness) that's strong?


Sam Altman: Yes, exactly.


Host: To what extent was the supporting system (Harness) developed alongside the model? Where did this integration happen? Was it in the post-training phase? Was it in the prompt? What exactly makes this integration work?


Sam Altman: Both. It's actually not really part of the pre-training process. But I will say, there is a more interesting phenomenon here: in the past, we have seen multiple times that some things we originally thought were very separable end up being baked more deeply into the system.


For example, our initial understanding of tool-calling. Now it's a key part of how we use these models, but initially, we didn't think it needed to be deeply integrated into the training process. Over time, we have done more and more in this regard.


I also suspect that models and the Harness will converge more and more over time. Furthermore, I also expect that pre-training and post-training will eventually converge more over time. It sounds like a cliché when I say this, but I still say it because I think it's very, very true: we are still very early in the whole paradigm. The level of maturity in this industry is probably still akin to the Homebrew Computer Club era.


Host: That's also why I find this so interesting. I wrote a few weeks ago that in any value chain, there will eventually be an integration point, which is crucial because two parts need to come together to make things work. Over time, much of the value will obviously settle there. My assessment at the time was that the integration of the Harness with the model is this key point. This certainly aligns with your interests, but it sounds like you also agree with this assessment.


Sam Altman: That does indeed align with my interests, and I do agree. But I would go broader: what really matters is that you input into Codex what you want to happen, and then it actually happens.


Host: You don't care about the implementation details.


Sam Altman: We have had too many examples in the process of figuring these things out: there are things that initially had to be solved at the system prompt level, and then they weren't needed anymore. The overarching observation here is that as the models get smarter, you have more flexibility to make them act the way you want. It sounds like an obvious statement, but it really is...


Host: It's easier to have a 10-year-old do something than a 5-year-old.


Sam Altman: Looking back on the GPT-3 era, what we had to do to extract a little practicality from these models, and now looking at it, you just don't have to do that anymore, because the models can of course understand and get things done out of the box. This trend may continue to go even further.


Matt Garman: I'd like to add a point. I completely agree with Sam's point of view. And when you communicate with customers, they actually know very well what they want these systems to do. Before our collaboration this time, customers were somewhat forced to put these things together themselves. They hoped that these models and agents could remember certain things, could collaborate well, could integrate into their existing systems. And this is not just a third-party tool issue, it also includes their own tools. They hoped that these agents could understand their own data, their applications, and their operating environment. And today, at least for now, all this integration work needs to be done by each customer themselves.


So, as part of our collaboration this time, we are building a new type of product together, bringing these things closer together so that customers can more easily do what they want to do. For example, identity capability is already built into the product; the ability to connect to databases and perform authentication will also be done in your AWS VPC, which is the Virtual Private Cloud. If it's just the OpenAI API on one side, and AWS on the other, these things could theoretically also be done. But by building this thing together, we make it easier and faster for customers to realize value and do what they want to do in their own business environment.


Host: So are you saying that in a general-purpose system (Harness), you can also build a runnable agent, it's just much more difficult? Are you making it easier? Or are you saying that if these things are not tied together, some things actually can't be done?


Sam Altman: Going back to your earlier analogy, before AWS appeared, if you were willing to stand in a data center room, buy a bunch of servers, figure out how to connect them, then hire your own network engineer, you could indeed do a lot of things. You could make a lot of things happen. Then all of a sudden, you just need to log into the AWS console, click on something like "I need another S3 instance," and you can do more things, because the startup energy and workload required for the foundational work have dropped significantly.


Today, of course, you can also do a lot of things with models. But every time I see someone using our model, or trying to build the workflows Matt just mentioned, I feel very conflicted. On the one hand, I am happy that they find these models impressive, that they think it's a magical technology; on the other hand, I'm also about to go crazy because they have gone through too much pain and torment to get anything to really run.


This is not only true for the developers building these products. Even just using ChatGPT, I see people copying and pasting from here to there, trying to craft a sophisticated prompt, and I know all of this will go away, and that excites me. Everything is still very early now and also very messy.


Host: As long as you don't remove its integration with BBEdit. That's my favorite feature of the ChatGPT app, hands down.



Sam Altman: Okay.


Host: (laughs) Thank you.


Sam Altman: First, these things are just really hard to do right now. We think that if it can be made much easier, it will bring a lot more value to developers and businesses. Second, there are many things that simply cannot reliably run right now. I believe that through our collaboration this time, it will not only be a story about usability, not just about issues like 'not having to build your colo' anymore. We will also explore a lot of new things together, allowing people to build products and services that in the past could not be achieved even after enduring a lot of pain and torture.


What's Truly Challenging for Enterprise Agents Is Permissions, Data, and Security


Host: I want to come back to the point of 'what can be built' later. But first, quickly back to Codex. Codex is a Harness-plus-model system, and it runs locally. Why is it now easier for agents to work locally?


Sam Altman: Actually, we initially had it running in the cloud. I think ultimately you do want it to be in the cloud.


Host: Of course. I was following along that transition to the cloud product path. But why did you revert to local again?


Sam Altman: Because your whole environment is there. Your computer is set up, your data is there, you don't have to think about too much. While this is not the final state, it's just easier to get it up and running.


However, entering a world where agents truly run in the cloud would obviously be great. For example, if you have a very intensive task, or you need to shut down your computer, or any other situation, you can offload the work to the cloud to continue processing. This direction would clearly be fantastic. But in the short term, the usability we can provide is still significantly more advantageous in a user's local environment.


Host: I have a certain understanding: the previous security model was more like a 'castle and moat' model, and now you are transitioning to a new zero-trust security model, where everything needs to have the right permission structure, authentication mechanisms, and all these details. For me, running things locally is somewhat like a self-imposed 'castle and moat': everything is local, so I assume they are all fine and easy to manage.


And my understanding of this product is that to make all these parts truly work in a production environment, you can't possibly have everything locally. You have to run in this environment from the get-go. Matt, does this statement resonate with you?


Matt Garman: I don't think any computing environment ever truly divorces from the client side. Running locally does indeed have its benefits. Most of your iPhone apps having local components also have reasons behind them, whether it's connectivity, latency, local computation, or access to files and apps.


The local client does have its nuances. As Sam mentioned, it's simple, runs well, but it's also constrained, with boundaries. You can't expand your local laptop; what you have is what you get. Once you enter an enterprise scenario, like sharing between two people, things get more complicated; thinking about permissions, security boundaries also become harder.


So, there are many parts here. I wouldn't say the local environment is a bad thing; it's just another thing. I think ultimately you'd want to bridge between on-premises and the cloud.


Host: That's precisely my point. In the cloud era, there are containers to help you bring the local environment closer to the production environment. But in the context of agents, it sounds like if you're going to deal with agents, as you just mentioned, it's like a virtual colleague, or something similar. If they have their own identity, their permissions, and all these things, then even just building them, you need to be in the right environment where they will eventually be deployed. It seems to me that way.


Sam Altman: I think there's still a lot to figure out here. For example, if you're an employee of a company, when you use a certain service, should you only have one account? And should your agent also use your account? Or should your agent use a different account so that the server can distinguish who's who?


Host: Or what if you want a lot of agents?


Sam Altman: Right. I suspect what we really need is something we haven't thought of yet. Maybe when Ben's agent logs in as Ben, it uses Ben's account but identifies itself as an agent, not the real Ben. We don't even have a basic concept to think about this yet, but we may have to figure it out soon.


And my sense is that there will be another 50 things like this. As agents join the workforce and act with increasing autonomy and task complexity, many of our mental models about how software works, how access control and permissions work inside companies, and in the broader internet will have to evolve.


Host: Matt, how do you view agent security, access policies, and similar issues?


Matt Garman: Yes, I do think that as you move more of these workloads to the cloud, as a centralized organization, you can exert more control over the security aspects. We have been in constant communication with customers, and this is indeed a concern for them. They would say, "I love the powerful models and opportunities agents can bring, but how do I ensure that I don't mess up and create an event that could end the company?"


This concern is real.


I think we can help in this regard because these issues are solvable. They really are. I think we can give customers some confidence: for example, "It runs inside this VPC," so at least you can control that boundary and know what it has access to; or it goes through a gateway, you can assign it permissions just like you would assign a role to it in other parts of your environment.


These are capabilities we have built over the past 20 years. We have built very rich capabilities around these structures, not only allowing Y Combinator startups to use AWS, but also enabling global banks, healthcare institutions, and government agencies worldwide to use AWS. I believe the entire security structure built around AWS can help us further accelerate customer adoption of this technology while providing the security guardrails they need for swift action.


Many times, in an enterprise, especially in industries with a strong risk-averse tendency, having these security guardrails that allow them to say, "As long as it runs in this sandbox, I'm willing to move fast," can actually help many customers start using this technology in broader scenarios.


Host: You mentioned earlier many capabilities that AWS has built over the past 20 years, and now you are trying to apply them to agents. These capabilities are now exposed through AgentCore. So, what is the relationship between Bedrock Managed Agents supported by OpenAI and Bedrock AgentCore?



Matt Garman: Many of the things we have jointly built are based on the building blocks of AgentCore, bringing these parts together.


Host: So it's sort of like a superset sitting on top of AgentCore?


Matt Garman: Both the AWS team and the OpenAI team used components of AgentCore, combined with OpenAI models and many other parts, to collectively build this product.


AgentCore can be understood as a set of foundational building blocks we provide. Just like on AWS, if you want to build your own agent workflow, you can directly use these modules: such as memory components, secure execution environments, permission management capabilities, and more. You can configure these capabilities yourself and combine them to create an agent system that suits your business. Some customers are already running these capabilities in production environments and have created some very cool applications.


Host: But not using OpenAI.


Matt Garman: But not using OpenAI. They must use different models today, that's true. Actually, no, that's not entirely true. We do have people using OpenAI for this.


Host: Oh, just invoking another cloud-based model or a similar approach.


Matt Garman: They are just directly calling the OpenAI model. So today, there are indeed people using OpenAI to do this, just not natively within Bedrock, but they are still using it. This is an open ecosystem where you can pull in different capabilities to build whatever you want to build. I bet people will continue to do so.


There are some builders out there who really like—borrowing Sam's analogy—even though it's not necessary today, they still enjoy building their own computers at home. People like to build. We think, for a long time, people will continue to build their own agents. But the vast majority of them will want a simpler way; they don't want to configure all these parts themselves. This is one of the things we are introducing through this partnership.


Host: I want to make this distinction clearer. Bedrock Managed Agents are a managed service; but users can also use AgentCore to connect to different models on AWS or any other cloud. Sam, does this constitute the difference between that and the form of OpenAI on Azure? In simple terms, on Azure, users mainly directly access the OpenAI API; while on Amazon, this time it is a more complete managed agent service. Is that understanding correct?


Sam Altman: That's correct.


Host: Are you confident about this? Is it defined correctly in various terms and scopes, ensuring it won't be an issue in the future?


Sam Altman: Yes. I think things will evolve over time, but as a starting point, I have a lot of confidence in this approach.


Host: Will this be an exclusive product for AWS? Or do you also expect to provide a similar managed experience on other clouds?


Sam Altman: Yes, we will be doing this exclusively with Amazon, and we are excited about it.


Host: How much of this exclusivity is because, "Look, we are using all of Amazon's APIs, so naturally, it's only on Amazon"? Or is it not just a simple "we are using Amazon APIs," but the entire concept of managed experience that will currently be on Amazon?


Sam Altman: In spirit, we want this to be a joint effort between the two companies.


Host: I see. One point mentioned in the press release, which also ties back to what Matt just said: in theory, you could call other APIs and stitch everything together yourself. But in this case, customer data will stay within AWS. So what exactly can OpenAI see? What does this statement mean?


Matt Garman: Yes. Essentially, everything will stay within your VPC, so the data will be protected within the Bedrock environment.



Host: I see. Will this product run on the OpenAI model through Bedrock, and will these models run on Trainium?


Matt Garman: They will run in a mixed manner—part will run on Trainium, and part will run on GPU.



Host: Is this purely a matter of timing? Because I remember in your announcement a few months ago...


Matt Garman: Part of it is timing, part of it is capability. I think as we collaboratively build the system, we will mix and match different components and use appropriate infrastructure for different parts. But over time, more and more parts will run on Trainium.


Sam Altman: We are very excited to have these models running on Trainium.


AI Platform Competition, Moving from Models to Infrastructure


Host: I can imagine. Matt, about Trainium, I have a quick question that is also a more general question. This is how I currently understand Trainium, and I want to confirm if this is correct. The name Trainium is quite unfortunate because in the future, its real importance will actually be in inference. Its primary mode of presentation will be through managed services like Bedrock. In other words, customers may not even know exactly what computing resources they are using. Is this a fair understanding?


Matt Garman: First and foremost, I'll take the blame for the terrible naming of all AWS services.


Host: It's okay, I run a word-of-mouth website called Stratechery, so I totally understand the issue with bad naming.


Sam Altman: I think the term Trainium is pretty cool.


Matt Garman: It is indeed cool.


Host: The term is cool, it just feels more like an inference chip than a training chip, in my opinion.


Matt Garman: Yeah. But, leaving the naming aside, it's useful for both training and inference. Honestly, it's a chip that we're very excited about. Whether it's this generation or future versions, we believe it's going to be a big business and a significant enabler of a lot of the things we collectively want to do.


By the way, I think, like GPUs, you're going to interact with a lot of these kinds of acceleration chips through an abstraction layer. The vast majority of customers don't really interact directly with a GPU, unless perhaps it's in their laptop for graphics or something. But when you're interacting with OpenAI, even though they're running on GPUs underneath, you're not talking to the GPU; when you're talking to Claude, whether it's on GPU, Trainium, or TPU underneath, you're not talking to those chips, you're talking to an interface.


Most of the inference out there is done by a small number of models. So whether it's five, ten, twenty, or a hundred models, it's not like millions of people are directly programming to these chips. It will continue to be that way in the future because these systems are too complex and too large in scale. If you want to train a model, not many people have enough money to train those models, or have the ability to really manage them. They're very complex systems, and the ability of the OpenAI team to extract value from large compute clusters is truly remarkable. But not many people have such a team. Regardless of the specific chip, I think this holds true for all acceleration chips.


Sam Altman: Ben, I increasingly feel that what we as a company need to do is become a token factory. But what customers really care about is that we can deliver the best intelligence unit at the lowest price and in the quantities and capacities they want.


Host: Do you think we will continue to adhere to the current pricing model? In other words, pricing based on tokens. Does this make sense in the long run?


Sam Altman: It doesn't make sense. In fact, our recently released 5.5 model is an interesting example. Its per-token cost is much higher than 5.4, but the number of tokens required to achieve the same answer has been greatly reduced. In reality, you don't care how many tokens this answer took, you just want the job done. What you want is a price and the capacity you can get.


So maybe calling it a "token factory" earlier was incorrect. We are more like an intelligence factory, or something similar. We want to offer as much "intelligence capability" as possible at the lowest price. Whether it's a larger model running fewer tokens or a smaller model running many tokens; whether it's GPUs, Trainium, or something else; or if we do it in any other creative way, I don't think customers will care.


In fact, they won't even directly interact with these things. When you put something into Codex or build a new agent in the Stateful Runtime Environment (SRE), you shouldn't have to think about these issues at all. You should just be amazed at how much you've gotten at such a low cost.


Host: The reduction in token usage, is it due to the model itself or the accompanying Harness system?


Sam Altman: Primarily the model, with a slight contribution from the Harness system.


Host: I see. Matt, by the way, I just asked Sam about exclusivity. Do you anticipate offering a similar managed service for other models in the future?


Matt Garman: We are currently focusing on doing this with OpenAI. We are very excited about what both sides are jointly building. As for the more distant future, that's a long way off.


Host: "The more distant future is a long way off," I'll have you hold on to that answer for now. It's okay, I have to ask this question.


Regarding the customer, I have another question. Sam, building on your earlier point, I would also like to hear both of your perspectives. When a customer truly goes into production, where does OpenAI's responsibility end and AWS's responsibility begin? From what I'm hearing, if all the data is on AWS, and the data stays there, and the customer is operating at a higher level, then ultimately it's AWS's responsibility? Is that the right way to think about it from a consumer's perspective?


Matt Garman: Yes, I think that's right. When you need to contact someone, you reach out to AWS support to help you. This is part of your AWS environment, what you've built on AWS. Your AWS account reps are there to assist you. When we built it, we involved our OpenAI colleagues as well to help you figure out how to best utilize this product or address similar issues. In some cases, if we run into a bug that needs their help to resolve, we escalate it to them. But AWS will be your first-line interaction.


Host: Sam, how do you see this business in relation to the scale of OpenAI's core API business?


Sam Altman: I hope it will be very large. We are putting a lot of effort into this and committing to buying a lot of compute. I believe there will be a lot of revenue to support all of this. One framework I am increasingly believing in is: when the price is low enough, the demand for intelligence is essentially limitless.


Host: So from that perspective, its demand is very elastic? Price goes down, demand goes up?


Sam Altman: Absolutely. But with a different example, if you lower the price of water, maybe you'll drink a bit more water, maybe you'll go from taking a shower once a day to twice a day, there is some elasticity there. But at some point, you would say, "You know what? I've had enough water."


Host: And if you absolutely need water, no matter how expensive it is, you will buy it.


Sam Altman: The same goes for other utilities. If electricity is cheaper, you will certainly use more electricity. But if you think of intelligence as a utility, I don't know of any other utility where I would think, "I just want more of it. As long as the price is low enough, I will continue to use more."


Matt Garman: Interestingly, computational power is to a large extent the same way. Think about the cost of a computing cycle today compared to 30 years ago—it has become orders of magnitude cheaper, yet the amount of computation sold today is more than ever before.


Host: Right. Until reaching extreme scale where cost matters, people usually don't really think about the cost of computation. Overall, from a strategic perspective, everyone just assumes they have computing power. So, how much further does AI need to go to reach this point where people no longer have a knee-jerk reaction of "how much did I spend here?"


Sam Altman: I don't think that's the immediate reaction now. Today, many more customers ask us, "Regardless of the price, can you give me more? I just need more capacity, and I'm willing to pay more." In contrast, there are far fewer people haggling with us about price.


But I do believe that we will continue to dramatically lower prices, and the drop will be quite astonishing. Maybe the more we do this, the more wealth flows into this area, more and more, and more. However, I am confident that we will continue to significantly lower the cost of the current level of intelligence.


One thing that somewhat surprises me, and I don't know if it will always be this way, but at least today, a significant portion of the total market demand is focused on cutting-edge models.


Host: Right, there are many issues in this regard. Servicing cutting-edge models is very expensive when people could actually use the previous version. But are you saying that, regardless, people just want to use the most cutting-edge one?


Sam Altman: So far, yes.


Matt Garman: I think this is a very positive signal that we are far from achieving the desired state, and there is still too much demand not being met. I do think it's somewhat similar to the computing demand 40 years ago. At that time, a computer was extremely expensive, and now the computing power in everyone's phone far exceeds that of back then, and we have sold billions of such devices.


I believe the AI world will undergo the same thing. Today, everyone wants to use cutting-edge models because you need them to do a lot of useful work, and everyone is very excited about the capabilities they provide.


I think over time, you will have a mix of models. By the way, some smaller models will be able to do certain things, even things that the latest OpenAI model cannot do yet. But they will become smaller, cheaper, and faster over time. At the same time, there will be those super-large models that will attempt to tackle cancer and similar issues.


But I think we are still in the early stages of possibility. When you see this much demand and such rapid growth in the early stages of possibility, the future is very exciting.


Host: Is there a somewhat cynical view: Sam, you have a set of customers who say, "We really want to use the OpenAI model, but all our stuff is on AWS, and we're not moving." Matt, on your end, it's like, "Look, all our stuff is on AWS, can you bring the OpenAI model over?" So it's really just about meeting this demand. And because AWS is the largest, this demand is astronomical. Is this the simplest answer? Or is there another layer here, where you actually believe you can deliver something highly differentiated, and it will also attract new customers for both of you?


Sam Altman: We are certainly very excited to be able to reach AWS customers, and many people really like AWS. Yes, that is true.


Matt Garman: That part is definitely true.


Host: (laughs) I see.


Matt Garman: Likewise, our customers are also very excited to have access to OpenAI technology.


Sam Altman: But I do believe that together we can build something incredible. I hope that a year from now, when people look back on this, the most important thing people talk about will not be: "Oh, we can finally access these models through AWS." or something similar. Instead, it will be: "Wow, we didn't realize how important this new product was."


I think, at the model, harness, and capability level, we are close to a new form of computing. It will make people feel very different from the current mindset of "I need this model's API" and the like.


Matt Garman: I completely agree, that's the key. The first part is good and nice; but the second part, I think, is what really excites all of us.


Host: Speaking of that, I mentioned earlier, I want to come back to this topic. I have a theory, not necessarily correct, and I'm curious how you see it, which is about "what else needs to be built." Specifically, there may eventually be a true middleware or middle layer. Within an organization, there will be various databases, SaaS applications, and various data fragments spanning different systems. There will be an agent layer or harness system on top. It seems like there's still something that needs to be built in between. OpenAI Frontier to some extent touches on this issue. Is this part of it? Or is this something to be built in the future? Or am I completely wrong, and we don't need this at all?


Sam Altman: You are absolutely right, we do need something there. Recently, in my conversations with clients, especially large enterprises, they would say, "I want some kind of agent runtime environment; I want a management layer that can connect my data to agents, while ensuring I understand where tokens are being spent, where they are not, and there is some kind of oversight; I also want some kind of workspace" — hopefully that would be Codex — "something similar for my employees to use."


What people are asking for is becoming very consistent. But now we need to actually go ahead and build the whole product.


Host: It sounds like we almost need a dual agent layer. One agent layer to maintain the middle layer, constantly delving into various data sources; and another is the actual user interface layer, where people interact. Does this align with the direction we are heading in, or am I off track?


Sam Altman: I agree with both of these points. This is likely what the world might look like today. But as models become truly intelligent, I don't think we know yet what the future architecture will look like.


Right now, at this layer you could call the user agent layer, people do want to interact with multiple agents. We allow you to build an agent for this thing, build an agent for that thing, they can talk to each other, and so on. Then at the corporate management layer, people will have various control mechanisms to help AI explore files in the file system.


Host: And then at some point, you realize, you're just holding onto the past for no reason. These things should have originally been done in the model.


Sam Altman: That's exactly what I wanted to say. At some point, you might say, "In fact, we have such amazing capabilities, let's redesign the entire architecture."


Matt Garman: Yes, I agree. I think there will indeed be something different here. I'm not sure if we all know what it is right now, but that's also the beauty of it. You let customers use it, build on it, and then you can learn from them, figure out how to make these things easier, faster, better for them.


Host: Sam, this is our second time doing this kind of product launch interview. The last time was with Kevin Scott talking about New Bing. Back then, you were quite confident about the threat you were posing to Google. How do you feel the outcome has been since?



Sam Altman: I think we are doing better than I expected. ChatGPT is, I believe, the first truly large-scale new consumer product since Facebook.


Host: Is this actually the answer? In other words, you are doing better than you expected, but it is mainly reflected in ChatGPT, rather than other areas?


Sam Altman: No, I think we have also done quite well on the API, especially with Codex. But that's not what I had in mind at the time. I was thinking that maybe these new language interfaces would change how people seek information on the internet. And, Google is also an absolutely extraordinary company. I think in many ways, Google is still underestimated in terms of the breadth and depth of what it does. But relatively speaking, I am satisfied with ChatGPT's performance.


Host: Matt, I also have a similar Google-related question for you. Google was on stage just this week, and Thomas Kurian (CEO of Google Cloud) talked about their fully integrated tech stack, from models to chips, and up to the agent layer, everything is integrated. Today, you appeared on stage with another company executive, and by definition, Amazon is not completely internally integrated.


Many people have criticized you for not having a cutting-edge edge model. But now we are entering the era of inference, and you are accustomed to serving a large number of companies. So, is there a situation where, because of maintaining a certain level of neutrality, you find yourself in a better position? Is this intentional, or did you unexpectedly end up in a good position, just that you didn't realize it would become so important?


Matt Garman: One thing is intentional. Since we started with AWS, we have always seen partners as a key part of supporting end customers. From the outset, this has been a very important part of our strategy: deep collaboration with partners. Perhaps unlike some other companies, we believe that if partners succeed, if they are building on top of us or with us, then we also succeed, and that's great.


We see it as coming together to make the pie bigger, that's a win. But that's not necessarily how other people see the world. Sometimes they say, "I have to own it all." That's okay too, that's a viewpoint.


But I think choice is essential. That's how the best products win. By the way, in such a world, you can have first-party products, and you can have many third-party products. But our view is, we want customers to be able to choose what's best for them. If what's best for them is something you've built yourself, great.


For us, if the best thing is built by our partners but it runs on top of us, we also see that as a win because that's the best thing for the customer. That's how we've always thought about it in the long term, and that's actually how we've built the Bedrock platform in the AI world. We want to support a wide range of models, support a wide range of capabilities. That's been the case from databases to compute platforms and everything else.


So I think it's a deliberate strategy. I also think it's a strategy that customers appreciate because they like it. We're excited to continue down this path.


Host: Yes, that's very interesting. There's a balance here between software, platform, and infrastructure, with everyone saying they'll serve everyone. But it feels like when you go back to the early days of AWS, it started from the I, which is Infrastructure. From my point of view, this almost gave you the most flexibility, allowing you to meet Sam in the middle. Sam has a strong S, which is Software; together, you're building a P, which is Platform. I guess you could put it that w