After two accidents in one week, looking back at how the Anthropic co-founders were talking about "safety" a year ago

Bitsfull2026/04/02 12:4319606

概要:

After two accidents in one week, looking back at how the Anthropic co-founders were talking about "safety" a year ago


Key Points Summary


Over the past week, Anthropic experienced two consecutive incidents:


First, nearly 3000 internal files were publicly exposed due to a CMS configuration error, followed by Claude Code v2.1.88 carrying a 59.8MB source map upon npm release, exposing 510,000 lines of source code.


For a company that embeds "security" into its DNA, experiencing consecutive operational failures is the pinnacle of irony.


Before rushing to mock, perhaps it's worth going back to listen to an internal conversation among Anthropic's seven co-founders over a year ago. This podcast was recorded in December 2024, where the seven individuals discussed how the company was established, how the RSP (Responsible Scaling Policy) was crafted, why the term "security" should not be used lightly, and CEO Dario's quote that has been repeatedly referenced:


"If a building's fire alarm goes off every week, then it's actually a very unsafe building."


Listening to this quote now definitely hits differently.


The Seven Co-founders, a Rapid Introduction


Dario Amodei | CEO, former VP of Research at OpenAI, with a background in neuroscience, the final decision-maker on Anthropic's strategic and security roadmap. He spoke the most during this conversation.


Daniela Amodei | President, Dario's sister. Previously spent five and a half years at Stripe, leading the Trust and Safety team, with earlier experience in the nonprofit and international development sectors. She primarily oversees Anthropic's organizational structure and external communication.


Jared Kaplan | Physics professor turned AI researcher, one of the core authors on scaling laws. Often provides an outsider's perspective and claims he switched to AI initially because he "got tired of doing physics."


Chris Olah | A figurehead in interpretability research, joined the Bay Area AI community at 19, and has worked at both Google Brain and OpenAI. The most idealistic technologist at Anthropic.


Tom Brown | First author of the GPT-3 paper, now overseeing Anthropic's computational resources. With a focus on engineering and infrastructure, he has discussed his transformation from "not quite believing AI would progress so rapidly" to changing his views on podcasts.


Jack Clark | Former Bloomberg technology journalist, now Head of Policy and Public Affairs at Anthropic. Acted as the host in this conversation, handling introductions and follow-up questions.


Sam McCandlish | Research co-founder, contributed the least amount of talk time but often nailed key points with a single sentence, acting as a "coup de grâce."


Key Insights Summary



Why Pursue AI: From Boredom in Physics to "Seeing Enough to Believe"


Jared Kaplan: "I had been doing physics for a long time and was a bit bored. I also wanted to work with more friends, so I switched to AI."


Dario Amodei: "I don't think I explicitly convinced you. I just kept showing you AI model results. At some point, after showing you enough, you said, 'Hmm, this looks right.'


Contrarian Bets: Much of the consensus is herd mentality disguised as maturity


Jared Kaplan: "Many AI researchers have been deeply scarred by the AI winter, as if ambition is not allowed."


Dario Amodei: "My deepest lesson from the past decade is that much of the 'everyone knows' consensus is actually herd mentality disguised as maturity. After witnessing several overnight consensus reversals, you'll say: No, we're betting on this. Even if you're only 50% correct, you contribute much that others haven't contributed."


Safety and Scalability Are Intertwined


Dario Amodei: "One of the motivations for scaling up the model at the time was that the model had to be smart enough for RLHF to work. This is still something we believe in: safety and scalability are intertwined."


RSP, Responsible Scalable Policy, is Anthropic's "Constitution"


Tom Brown: "For Anthropic, RSP is like our constitution. It is a guiding core document, so we are willing to invest a lot of time and effort in refining it repeatedly."


Dario Amodei: "RSP will prevent plans that do not meet safety standards from progressing. We are not just paying lip service; we are genuinely integrating safety into every aspect."


When the fire alarm goes off too many times, no one runs when there is a real fire


Daniela Amodei: "We cannot casually use the word 'safety' to steer progress. Our real goal is to make it clear to everyone what we mean by safety."


Dario Amodei: "What truly undermines safety is often those frequent 'safety drills.' If there is a building where the fire alarm goes off every week, then that is actually a very unsafe building."


The 'Noble Failure' Trap


Chris Olah: "There is a notion that the most ethical action is to sacrifice other goals for the sake of safety, to demonstrate one's purity of purpose for the cause. But this approach is actually self-defeating. Because it leads to decision-making falling into the hands of those who do not value safety."


Co-founders Commit to Donating 80% of Income


Tom Brown: "We all commit to donating 80% of our income to causes that can drive social development, which is something everyone wholeheartedly supports."


Nobody Wants to Start a Company, But Feels They Must


Sam McCandlish: "None of us actually wanted to start a company at first. We just felt it was our duty because it's the only way to ensure AI progresses in the right direction."


Daniela Amodei: "Our mission is both clear and pure, a situation not commonly found in the tech industry."


Interpretability: There's a Whole 'Artificial Biology' in Neural Networks


Chris Olah: "Neural networks are so beautiful, with many beauties we haven't seen yet. Sometimes I imagine, in ten years, walking into a bookstore, buying a textbook on neural network biology, and it will contain all sorts of amazing things."


AI Used to Enhance Democracy, Not Become a Tool of Authoritarianism


Dario Amodei: "We're concerned that if AI is wrongly developed, it could become a tool of authoritarianism. How do we make AI a tool that promotes freedom and self-determination? The importance of this area is no less than biology and interpretability."


From White House Meetings to Nobel Prizes: AI's Impact Has Long Transcended the Tech Circle


Jared Kaplan: "In 2018, you wouldn't think the president would call you to the White House to say they're looking at a language model."


Dario Amodei: "We've seen a Nobel Prize in the field of chemistry awarded to AlphaFold, and we should strive to develop tools that can help us create hundreds of AlphaFold-like tools."


Why Study AI?


Jack Clark: Why did we start doing AI in the first place? Jared, why do you do AI?


Jared Kaplan: "I did physics for a long time before, got a bit bored, and also wanted to work with more friends, so I got into AI."


Tom Brown: "I thought it was Dario who convinced you."


Dario Amodei: "I don't think I explicitly 'convinced' you; I just kept showing you AI model results, trying to convey their generality, not just for a single issue. At some point, I showed you enough, and you said, 'Well, this looks right.'"


Jack Clark: Chris, when you were doing interpretability research, did you know everyone at Google back then?


Chris Olah: "No. In fact, when I first came to the Bay Area at 19, I already knew many of you. I met Dario and Jared back then; they were postdocs, and I thought they were really cool at the time. Later, at Google Brain, Dario joined, and we sat side by side for a while, I also worked with Tom. Later on, when I went to OpenAI, I worked with all of you."


Jack Clark: "I remember meeting Dario at a conference in 2015 and wanting to interview you. Google PR even said I had to read all your papers first."


Dario Amodei: "I was writing 'Concrete Problems in AI Safety' at Google back then."


Sam McCandlish: "Before I started working with you, you invited me to the office to chat, and it was like you explained AI as a whole. I remember after our chat, I thought, 'Wow, this is much more serious than I realized.' You talked about 'big block of compute,' number of parameters, scale of human brain neurons, and so on."


Groundbreaking Scaling


Jack Clark: "I remember at OpenAI, when we were working on scaling laws, making the models bigger really started to work, and it continued to work strangely well in many projects, from GPT-2 to scaling laws to GPT-3, that's how we kept moving closer."


Dario Amodei: "We are the 'make-it-happen' kind of people."


Jared Kaplan: "We were also very excited about safety. At that time, there was an idea: AI would be very powerful, but might not understand human values, and might even be unable to communicate with us. Language models to some extent ensure that it understands a lot of implicit knowledge."


Dario Amodei: "There is also RLHF on top of language models. One of the motivations for scaling up the model at that time was that the model had to be smart enough for RLHF to work. This is still what we believe in now: safety and scalability are intertwined."


Chris Olah: "Yes, the scaling work at that time was actually part of the safety team as well. Because we felt that to make people take safety seriously, we first had to be able to predict AI trends."


Jack Clark: "I remember being at a UK airport, sampling fake news from GPT-2, and then sending it to Dario on Slack saying, 'This really works, it could have a huge policy impact.' I remember Dario's reply was 'Yes'."


Later, we also did a lot of release-related work, which was crazy.


Daniela Amodei: "I remember that release period, that was when we really started to collaborate, during the GPT-2 release."


Jack Clark: "I think that was very helpful for us. We first did something 'a bit strange but safety-oriented' together, and then later we did Anthropic, something larger in scale, equally strange but safety-oriented."


The Early Days of AI


Tom Brown: "Returning to the 'Concrete Problems' article. I joined OpenAI in 2016, and at that time, both you and I were among the earliest group of people. I felt that the article was like the first mainstream AI safety paper. How did it come about?"


Dario Amodei: "Chris knows, he was involved. We were at Google at that time, and I can't remember what my main project was at the time, it felt like this was something I procrastinated on."


We want to document the open problems in AI safety. At the time, AI safety was often discussed in abstract terms, and we wanted to ground it in real-world machine learning. This line of work has been ongoing for six or seven years now, but back then, it was seen as a strange idea.


Chris Olah: "I think it was, in some sense, almost a political project. At that time, many people didn't take security seriously. We wanted to compile a list of issues that everyone would agree were reasonable, many of which were already in the literature, and then get credible, cross-institutional people to co-sign."


I remember spending a long time discussing with over twenty researchers at Brain to garner support for publication. Looking back today, the specific problems may not all hold up, and they may not have been the most precisely formulated issues. However, if seen as a consensus-building effort to demonstrate 'there are real issues here that deserve serious attention,' then it was a significant moment.


Jack Clark: "Eventually, you will enter a very strange science fiction world. I remember in the early days of Anthropic, we talked about Constitutional AI, and Jared said, 'Let's write a constitution for the language model, and its behavior will change.' It sounded crazy at the time. Why did you think it was feasible?"


Jared Kaplan: "I discussed this with Dario for a long time, and I think simple methods often work surprisingly well in AI. The initial versions were quite complex, but they were continuously simplified until it boiled down to leveraging the model's strength in multiple-choice questions, giving it explicit prompts to tell it what to look for, and then we could directly write down the principles."


Dario Amodei: "This goes back to 'The Big Blob of Compute,' 'The Bitter Lesson,' and the 'Scaling Hypothesis': as long as you can give AI a clear objective and data, it can learn. A set of instructions, a set of principles—language models can read them, compare them with their behavior, and the training objective is right there. So, Jared and I believed that there was a way to make it work, as long as we iterated on the details."


Jared Kaplan: "It was all very strange for me in the early days. I transitioned from physics, and now everyone is excited about AI, making it easy to forget the atmosphere of that time. When I discussed these things with Dario back then, I felt that many AI researchers were psychologically scarred by the AI winter, as if 'having ambition' was not allowed. To discuss security, we first needed to believe that AI could be very powerful and very useful, but at the time, there was almost a prohibition on ambition. Physicists have an advantage of 'arrogance'; they often embark on ambitious endeavors and are comfortable discussing grand visions."


Dario Amodei: "I think this is true, a lot of things just couldn't be said in 2014. This is also a general problem in academia, except in certain areas, where institutions are increasingly risk-averse. Industrial AI has also inherited this mindset, and I believe it will only emerge from it around 2022."


Chris Olah: "There are two forms of 'conservatism': one is taking risks seriously, and the other is treating taking risks seriously and believing that ideas can succeed as arrogance. We were dominated by the latter at that time. In history, in the 1939 nuclear physics discussions, a similar situation arose: Fermi was cautious, while Szilard or Teller took risks more seriously."


Dario Amodei: "The deepest lesson I've learned over the past decade is that many 'everyone knows' consensuses are actually herd effects disguised as maturity. After you've seen consensus flip overnight a few times, you will say, 'No, we are betting on this.' It may not necessarily be correct, but you ignore the noise and place your bet. Even if you are only 50% correct, you will contribute a lot that others have not."


Public Attitudes Toward Artificial Intelligence


Jared Kaplan: "Today, in some security issues, it is also like this: the external consensus believes that many security issues would not naturally emerge from the technology, but in our research at Anthropic, we have seen that they do indeed naturally emerge."


Daniela Amodei: "But in the past 18 months, this has changed. At the same time, the world's emotions towards AI have also clearly changed. When we conduct user research, we more often hear ordinary users expressing concerns about AI's overall impact on the world."


Sometimes it's about work, bias, toxicity, sometimes it's about 'will it mess up the world, change the way humans collaborate', these are things I didn't completely anticipate.


Sam McCandlish: "For some reason, the ML research community is often more pessimistic about 'AI becoming very powerful' than the public."


Jared Kaplan: "In 2023, Dario and I went to the White House, and in the meeting, Harris and Raimondo basically meant: we are watching you, AI is a big deal, we are paying serious attention, but in 2018, you wouldn't have thought 'the president would call you to the White House to say they are monitoring language models'."


Tom Brown: "What's interesting is that many of us entered the scene when things were still uncertain, like Fermi with the atomic bomb, where there was some evidence that the bomb could be created but also a lot of evidence that it couldn't. However, he ultimately decided to try because if it were true, the impact would be significant, so it was worth pursuing."


From 2015 to 2017, there was some growing evidence that AI could be a big deal. In 2016, I talked to a mentor: I had done startups before and wanted to work on AI safety, but my math wasn't strong enough, and I didn't know what to do. At that time, some said you had to be proficient in decision theory, some said there wouldn't be a crazy AI event, and very few were truly supportive.


Jack Clark: "In 2014, when I did the ImageNet Trends report, people thought I was crazy. In 2015, when I wanted to write about NVIDIA for mentioning GPUs in papers, I was also considered crazy. In 2016, when I left journalism for AI, I even got emails saying, 'You've made the biggest mistake of your life.' At that time, from many perspectives, seriously betting on 'scaling up will happen' did indeed seem crazy."


Jared Kaplan: "How did you decide? Were you conflicted?"


Jack Clark: "I made a reverse bet: I demanded to be a full-time AI reporter with double the salary, knowing they wouldn't agree. Then I resigned after sleeping on it. Because I read archival files every day, I always felt that something crazy was happening, and at some point, you have to bet big with high conviction."


Tom Brown: "I wasn't that decisive; I hesitated for six months."


Daniela Amodei: "And at that time, the idea that 'engineers can also significantly drive AI' wasn't mainstream. It was believed that 'only researchers could do AI,' so your hesitation isn't surprising."


Tom Brown: "Later, OpenAI said, 'You can help with AI safety through engineering,' and that's what made me join. Daniela, when you were at OpenAI or when you were my manager, why did you join at that time?"


Daniela Amodei: "I was at Stripe for five and a half years, and Greg was my boss. I also introduced Greg and Dario to each other. He was in the process of founding OpenAI, and I told him, 'The smartest person I know is Dario. If you can get him on board, that would be your luck.' Later, Dario joined OpenAI."


Perhaps like you, I have been thinking about what to do next after leaving Stripe. I joined Stripe because I felt I needed more skills from my previous work in nonprofits and international development, thinking I might eventually return to that field.


Before joining Stripe, I felt I didn't have enough ability to help those less privileged than myself. So I was looking at other tech companies to find a new way to have a bigger impact, and OpenAI at that time seemed like a good fit. It was a nonprofit organization committed to a very important and impactful goal.


I have always believed in AI's potential because of my knowledge of Dario, and they indeed needed help in management, so I felt the role was a great match for my background. I thought, "This is a nonprofit organization with a group of great people with a noble vision, but their operation seems a bit chaotic." The challenge excited me because I could step in.


At that time, I felt like a utility player, responsible not only for managing team members and leading some tech teams but also for organizational expansion. I handled the organizational expansion work, worked in the language team, and later took on other tasks. I was involved in some policy matters and collaborated with Chris. I saw many talented individuals in the company, which made me eager to join and help make the company more efficient and organized.


Jack Clark: "I remember after finishing GPT-3, you said, 'Have you heard of trust and safety?'"


Daniela Amodei: "I used to lead the trust and safety team at Stripe. For a technology like this, you might need to consider the trust and safety aspect. This actually bridges AI Safety Research and more practical day-to-day work, that is, how to make the model truly safe."


It is crucial to present "this technology will have a significant impact in the future." At the same time, we also need to engage in more practical work on a daily basis to lay the groundwork for dealing with higher-risk scenarios in the future.


Responsible Expansion Policy: Ensuring the Safe Development of AI


Jack Clark: "Let's talk about how the Responsible Scaling Policy (RSP) came to be, why we thought of it, and how we are now applying it, especially considering the work we have been doing in the trust and security of the model. So, who initially proposed this RSP?"


Dario Amodei: "It was initially proposed by me and Paul Christiano, around the end of 2022. The initial idea was whether we should temporarily limit the model before it scales to a certain level until we find a way to address certain security issues."


But later, we felt that simply limiting scaling at a particular point and then releasing the restriction was somewhat odd. So we decided to set a series of thresholds, and each time the model reaches a threshold, a series of tests is required to evaluate whether the model has the corresponding security capability.


At each threshold, we need to take stricter security and safety measures. However, we initially had an idea: it might be better if this were enforced by a third party. In other words, this policy should not be the responsibility of a single company, as other companies might be reluctant to adopt it. Therefore, Paul personally designed this policy. Of course, over time, many details have also changed. And our team has been researching how to make this policy work better.


When Paul formalized this concept, almost as he was announcing it, we also released our version within a month or two. In fact, many members of our team were deeply involved in this process. I remember writing at least one of the early drafts myself, but the whole document underwent multiple revisions.


Tom Brown: "For Anthropic, RSP is like our 'constitution.' It is a guiding foundational document, so we are willing to invest a lot of time and effort to refine it repeatedly to ensure its accuracy and completeness."


Daniela Amodei: "I think the development of RSP at Anthropic has been really interesting. It has gone through multiple stages and requires a variety of skills to drive its implementation. For example, there are some grand ideas, mainly led by people like Dario, Paul, Sam, and Jared. They think about, 'What are our core principles? What message do we want to convey? How do we ensure our direction is correct?'"


But beyond that, there is also a very practical operational aspect to the work, such as in the continuous iteration process, where we assess and adjust some details. For example, we originally expected to achieve certain goals at a certain security level, but if we have not achieved it, we will reassess and make sure we can hold ourselves accountable for our work results.


Additionally, there are many adjustments related to organizational structure. For example, we decided to redesign the RSP's organizational structure to clarify responsibilities. I like to use the analogy of a constitution to emphasize the importance of this document. Just as the United States established a whole set of institutions like courts, the Supreme Court, the president, and the two houses of Congress to ensure the implementation of the constitution, although these institutions have other responsibilities, their existence is largely to uphold the constitution, and we at Anthropic's RSP are going through a similar process.


Sam McCandlish: "I think this actually reflects a core view of ours on security issues: Security issues are solvable. This is a very complex and challenging task that requires a significant amount of time and effort."


Just like in the field of automotive safety, the relevant institutions and organizations have been established over many years of development. But the issue we face now is: Do we have enough time to do this work? Therefore, we must quickly identify the key institutions needed for AI security and establish them here first, while ensuring that these institutions can be used as a reference and promoted elsewhere.


Dario Amodei: "This also helps to promote internal collaboration within the organization because if any part of the organization's behavior does not align with our security values, the RSP will somehow expose the issue, right? The RSP will prevent them from continuing with plans that do not meet security standards. So, it also becomes a tool that constantly reminds everyone to ensure that security becomes a fundamental requirement in the product development and planning process. We are not just talking about slogans but are actually integrating security into every aspect. If someone joins the team and cannot align with these principles, they will find it difficult to fit in. Either adapt to this direction or find it challenging to continue."


Jack Clark: "Over time, the RSP has become increasingly important. We have invested thousands of hours of work into it, and when I explain the RSP to senators, I say, 'We have put in place measures to ensure that our technology is not easily abused and can also ensure security.' Their typical response is, 'That sounds normal. Don't all companies do that?' It makes me laugh and cry a little; in fact, not every company does this."


Daniela Amodei: "Furthermore, I believe that in addition to promoting team alignment on values, RSP has also enhanced the company's transparency. Because it clearly documents what our goals are, everyone within the company can understand, and external parties can also clearly know our security goals and direction. Although it is not perfect, we have been continuously optimizing and improving it."


I think it is essential to explicitly state "what core issue we are focusing on," as we cannot casually use the word "security" to influence work progress, such as saying, "Because of a security issue, we cannot do something," or "Because of a security issue, we must do something." Our real goal is to ensure everyone understands what we mean by security.


Dario Amodei: "In the long term, what truly damages security is often those frequent 'security drills.' I have said: 'If there is a building where the fire alarm goes off every week, then this is actually a very unsafe building.' Because when a real fire happens, no one may pay attention, and we must pay a lot of attention to the accuracy and calibration of the alarm."


Chris Olah: "Looking at it from a different perspective, I believe RSP has created healthy incentive mechanisms on many levels. For example, internally in the company, RSP aligns the incentive mechanisms of each team with security goals, which means that if we do not make sufficient progress in security, related work will be paused."



Jared Kaplan: "I agree with these points, but I think this may underestimate the challenges we face in developing the right policies, evaluating criteria, and setting boundaries. We have undergone a significant amount of iteration in these areas and are still continuing to optimize. A challenging issue is that for some emerging technologies, it is sometimes challenging to definitively determine whether they are dangerous or safe. Many times, we encounter a vast gray area. These challenges made me very excited during the early development of RSP, and still do. However, at the same time, I realize that executing this strategy clearly and making it truly effective is more complex and challenging than I initially imagined."


Sam McCandlish: "The gray areas are unpredictable because they are everywhere. Only when you truly start implementation can you discover the issues. Therefore, our goal is to implement everything as early as possible so we can identify potential problems quickly."


Dario Amodei: "You need to go through three to four iterations to truly achieve perfection. Iteration is a very powerful tool; you are almost never going to get it completely right the first time. So if the risks are escalating, you need to complete these iterations early rather than waiting until the end."


Jack Clark: "At the same time, you also need to establish internal policies and processes. While the specific details may change over time, fostering the team's execution capability is paramount."


Tom Brown: "I am responsible for Anthropic's computing resource management. For me, we need to communicate with external stakeholders as different external parties have different views on the pace of technological development. I initially thought technology would not progress so rapidly, but my view changed later on, so I can really relate to this. I find RSP particularly useful in my case, especially when communicating with those who think technology will develop slowly. We can tell them: 'We do not need to take extreme security measures until the technology has advanced to a critical level.' If they say, 'I believe things will not become urgent for a long time,' I can respond by saying, 'Okay, then we do not need to take extreme security measures for now.' This makes communication with the outside world much smoother."


Jack Clark: "So, in what ways has RSP influenced everyone?"


Sam McCandlish: "Everything revolves around evaluation; every team is conducting evaluations. For example, your training team is constantly assessing. We are trying to determine if this model has become powerful enough to potentially pose a risk."


Daniela Amodei: "This actually means we need to measure the model's performance according to RSP standards, including checking for signs that might raise our concerns."


Sam McCandlish: "Assessing the model's minimum capability is relatively easy, but assessing the model's maximum capability is very challenging. Therefore, we have invested a lot of research effort trying to answer questions like: 'Can this model perform certain dangerous tasks? Are there methods we have not considered, such as mind mapping, best event, or the use of certain tools, that could enable the model to perform highly risky behaviors?'"


Jack Clark: "In the policy-making process, these assessment tools are very helpful. Because 'security' is a very abstract concept, when I say, 'We have an assessment tool that determines whether we can deploy this model,' then we can collaborate with policymakers, national security experts, and experts in the CBRN (chemical, biological, radiological, and nuclear) field to jointly develop precise assessment criteria. Without these specific tools, such collaborations may not be possible at all. But once we have clear standards, people are more willing to get involved to help us ensure its accuracy. So, in this respect, the role of RSP is very significant."


Daniela Amodei: "RSP is also very important to me and often influences my work. I find it interesting that my way of thinking about RSP is somewhat unique, more from its 'tone,' that is, its way of expression. Recently, we have made significant adjustments to the tone of RSP because the previous tone was too technical and even somewhat confrontational. I spent a lot of time thinking about how to build a system that people would be willing to engage with."


"If RSP were a document that everyone in the company could easily understand, it would be much better. Just like our current OKRs (Objectives and Key Results). For example, what is the main goal of RSP? How do we know if we have achieved the goal? What is the current AI Security Level (ASL)? Is it ASL-2 or ASL-3? If everyone knows the key areas to focus on, identifying potential issues will become easier. On the contrary, if RSP is too technical and only a few people can understand it, then its actual utility will be greatly diminished."


"I am pleased to see RSP moving in a more understandable direction. Now, I think most people in the company, possibly even everyone, no matter what their position is, can read this document and think, 'This makes sense. I hope we develop AI under the guidance of these principles, and I understand why these issues are important. If I encounter a problem at work, I roughly know what to pay attention to.' We want to make RSP simple enough so that people working on the factory floor can easily say, 'The safety belt should be connected here, but it's not in place now,' and spot issues promptly."


"The key is to establish a healthy feedback mechanism that enables smooth communication among the leadership, the board, other company departments, and the teams actually involved in R&D. I think most problems often arise due to poor communication or information distortion. If issues arise solely for these reasons, it would be very unfortunate, right? Ultimately, what we need to do is put these concepts into practice and ensure they are straightforward and easy for everyone to understand."


The Founding Story of Anthropic


Sam McCandlish: "Actually, none of us initially had the intention to start a company. We just felt it was our responsibility, and we had to take action because it was the only way to ensure that AI development progressed in the right direction, which is why we made that commitment."


Dario Amodei: "My initial idea was very simple; I just wanted to invent and explore new things in a beneficial way. This idea guided me into the AI field, which requires a lot of engineering support and ultimately a lot of financial support."


However, I found that without a clear goal and plan to establish a company and manage the environment, many things, although accomplished, would repeat the same mistakes in the tech industry that made me feel alienated. These mistakes often stem from the same people, the same attitudes, and the same mindset. So at some point, I realized we had to do this in an entirely new way, which was almost inevitable.


Jared Kaplan: "I remember back when we were in graduate school, you had a whole plan trying to explore how scientific research could promote the public interest. I think this is very similar to our current thinking. I remember you had a project called 'Project Vannevar' at that time, aimed at achieving this. I was a professor back then, observed the situation, and deeply believed that AI's impact was growing at an incredibly fast pace."


However, due to the high financial demands of AI research and being a physics professor, I realized I couldn't solely rely on academic research to drive these advancements. I wanted to join hands with trustworthy people to establish an institution to ensure AI development moved in the right direction. But frankly, I would never advise anyone to start a company, nor have I ever had such a desire. For me, it was just a means to achieve a goal. I believe that typically, the key to success lies in truly caring about achieving a meaningful goal for the world and then finding the best means to achieve that goal.


Building a Culture of Trust


Daniela Amodei: "I often think about our team's strategic advantages, and one that may sound somewhat unexpected but is crucial is our high level of trust among us. Getting a large group of people to have a shared mission is very challenging, but at Anthropic, we have been able to successfully instill this sense of mission in more and more people. In this team, including leadership and all members, everyone has come together because of a shared mission. Our mission is both clear and pure, a rarity in the tech industry."


I feel that the goal we are striving to achieve is filled with a profound sense of purpose. None of us started this because we wanted to start a company. We just felt it was necessary. We could not continue our work in the same way at our previous places and had to do it on our own.


Jack Clark: "At that time, with the emergence of GPT-3 and projects we had all been exposed to or involved in, such as scaling laws, we could clearly see the AI trend in 2020. We realized that without prompt action, we might soon reach an irreversible tipping point. We had to act to make an impact in this space."


Tom Brown: "Building on Daniela's point, I do believe there is a high level of trust within the team. Each of us is acutely aware that we joined this team because we wanted to contribute to the world. We also made a collective commitment to donate 80% of our income to causes that can drive societal progress, a decision everyone wholeheartedly supported: 'Yes, of course we will do that.' This level of trust is very special and rare."


Daniela Amodei: "I find Anthropic to be a company with very little political color. Of course, our perspective may differ from the average person, and I constantly remind myself of that. I believe our hiring process and the traits of our team members create a culture here that inherently rejects 'office politics'."


Dario Amodei: "Then there's the cohesiveness of the team, which is paramount. Whether it's the product team, research team, trust and safety team, marketing team, or policy team, everyone is working towards the same goal for the company. When different departments within a company pursue entirely different goals, it often leads to chaos. It's highly abnormal for them to think other departments are sabotaging their work."


I believe one of our most significant accomplishments is successfully maintaining the overall coherence of the company. Mechanisms like the RSP play a critical role in this. This mechanism ensures that it's not some departments causing problems while others try to fix them internally; instead, all departments are fulfilling their functions while collaborating under a unified theory of change framework.


Chris Olah: "I initially joined OpenAI because it was a nonprofit organization where I could focus on AI safety research. However, over time, I realized that this setup wasn't entirely suited to me, which led me to make some tough decisions. Throughout this process, I had a lot of trust in Dario and Daniela's judgment, but I didn't want to leave. I was hesitant to leave because I didn't believe that adding more AI labs necessarily benefited the world."


As we ultimately decided to leave, I was still skeptical about starting a company. I had previously argued that we should establish a nonprofit organization focused on safety research. However, a pragmatic attitude and a candid acknowledgment of real-world constraints made us realize that founding Anthropic was the best way to achieve our goals.


Dario Amodei: "One important lesson we learned early on was: Underpromise, overdeliver. Stay grounded, face trade-offs, because trust and reputation are more important than any specific policy."


Daniela Amodei: "One unique aspect of Anthropic is the high level of trust and unity within the team. For example, when I see Mike Krieger insisting on not releasing certain products for security reasons, while also seeing Vinay discussing how to balance business needs to drive projects forward, I feel very special. Additionally, the technical security team, the reasoning team engineers are also discussing how to ensure the product is both secure and practical. This shared goal and pragmatic attitude are some of the most attractive aspects of the work environment at Anthropic."


Dario Amodei: "A healthy organizational culture is where everyone can understand and accept the trade-offs faced collectively. The world we live in is not perfect, and every decision requires finding a balance between different interests, a balance that is often impossible to satisfy completely. However, as long as the entire team can collectively face these trade-offs under a unified goal and contribute to the overall objective from their respective positions, this is a healthy ecosystem."


Sam McCandlish: "In a sense, this is an 'upward race.' Yes, it is indeed an 'upward race.' Although this is not a risk-free choice, things could go wrong, but we all agree: 'This is the choice we have made.'"


Racing to the Pinnacle of AI


Jack Clark: "But the market is fundamentally pragmatic, so as Anthropic succeeds more, others are incentivized to mimic the patterns of success that got us here. Also, when our success is closely tied to our actual safety work, that success creates a kind of 'gravity' in the industry, pulling other companies into the competition. It's like we invented the seatbelt, and others can follow suit — it's a healthy ecosystem."


Dario Amodei: "But if you go around saying, 'We will not develop this tech, and you can't do it better than anyone else,' that's not going to work, because you haven’t shown a viable path from the present to the future. What the world needs is for an industry or a company to find a way for society to transition from 'tech doesn't exist' to 'tech exists in a powerful form and is effectively managed by society.' I think the only way to achieve that is for individual companies and even eventually at an industry-wide level to face these trade-offs."


You need to find a way to stay competitive, even lead the industry in some areas, while also ensuring the safety of the technology. If you can do that, then your attractiveness to the industry will be very strong. From regulatory environments to top talent wanting to join your company to customer perceptions, all these factors will drive the industry in the same direction. If you can prove that security can be achieved without sacrificing competitiveness — finding those win-win solutions — then other companies will be incentivized to mimic that behavior."


Jared Kaplan: "I think that's why mechanisms like RSP are so important. We can clearly see where the technology is headed and understand the need to be highly vigilant about certain issues, but we also must avoid crying wolf and saying, 'Innovation should stop here.' We need to find a way for AI technology to provide customers with a useful, innovative, and enjoyable experience while laying out the constraints we must adhere to, constraints that will ensure the system's security and make other companies believe they can also succeed and compete with us — all under secure conditions."


Dario Amodei: "Several months later, as we rolled out RSP, the three most prominent AI companies also introduced similar mechanisms. Explainability research was another area where we made breakthroughs. Additionally, we collaborated with AI safety research organizations, and this overall focus on security is having profound effects."


Jack Clark: "Yes, the Frontier Red Team was quickly imitated by other companies. This is a good thing, as we hope all labs will test those potential high-risk security vulnerabilities."


Daniela Amodei: "Jack also mentioned earlier that clients are very concerned about security issues. Clients do not want the model to produce false information, nor do they want the model to be easily bypassed by security restrictions. They want the model to be useful and harmless. We often hear them say in client communications: 'We chose Claude because we know it is more secure.' I think the impact on the market is huge. We can provide trustworthy and reliable models, putting significant market pressure on competitors."


Chris Olah: "Perhaps we can further expand on Dario's earlier point. There is a saying that the most ethical behavior is 'noble failure.' That is, you should sacrifice other goals for security, even act in an unrealistic way to demonstrate your purity of purpose for safety. But I think this approach is actually self-defeating."


First, this approach leads decision-making into the hands of those who do not prioritize security. On the other hand, if you strive to find a way to align incentives, place difficult decisions where the power lies to support the right decisions, and base them on the strongest evidence, then you can trigger the 'upward race' that Dario described. In this race, it is not those who do not care about security who are marginalized, but others are forced to follow your lead and join this race."


The Future of Artificial Intelligence


Jack Clark: "So, what are you all excited about for what we're going to do next?"


Chris Olah: "I think there are many reasons to be excited about interpretability. One obvious one is for security reasons, but there is another reason that I find equally exciting on an emotional level, and that is that I think neural networks are wonderful, and there is much beauty we have yet to see. We always treat neural networks as a black box, not particularly interested in their internal structure, but when you start delving into them, you will find amazing structures inside."


This is a bit like how people view biology; some may think, 'Evolution is boring, it's just a simple process that has been running for a long time, and then it created animals.' But in reality, every animal created by evolution is full of incredible complexity and structure. And I think evolution is an optimization process, just like training a neural network. Inside neural networks, there is a whole complex structure akin to 'artificial biology.' If you are willing to delve into them, you will find many amazing things."


I feel like we are just beginning to slowly unveil its mystery. It's so incredibly fascinating, with so much waiting for us to discover. We are just starting to open the doors, and I believe the upcoming discoveries will be very exciting and marvelous. Sometimes I imagine myself walking into a bookstore ten years from now, buying a textbook on neural network explainability, or a book truly delving into the "biology" of neural networks, filled with amazing content. I believe that in the next ten years, or even in the next few years, we will truly start to uncover these things, and it will be a wild and awe-inspiring journey.


Jack Clark: "A few years ago, if someone had said, 'The government will establish a new agency to test and evaluate AI systems, and these agencies will be highly professional and impactful,' you might not have believed it was true. But it has happened. It can be said that the government has established 'new embassies' to deal with this new category of technology, and I look forward to seeing where this will lead. I think this actually means that nations have the ability to address such social transformations, not just relying on corporations, and I am glad to be part of it."


Daniela Amodei: "I am already excited about this, but I feel that just imagining what future AI can do for humanity is hard not to get excited about. Even now, the signs that Claude can help in developing vaccines, conducting cancer research, and biological research are already incredible. Seeing what it can do now is amazing, and when I look ahead three to five years, imagining Claude truly solving many fundamental problems we humans face, especially in the health field, also makes me very excited. Thinking back to my days in international development work, if Claude could have helped with the inefficient work I was doing back then, it would have been so amazing."


Tom Brown: "I think, from a personal perspective, I really enjoy using Claude in my work. So, recently, I've been using Claude at home to chat with it about some things, and the biggest change recently has been with coding. Six months ago, I hadn't used Claude for any programming-related work at home, and our team was also not using Claude much for coding, but that has significantly changed now. For example, last week, I gave a talk at an event hosted by Y Combinator. At the beginning, I asked everyone, 'How many of you are using Claude for programming now?' Almost 95% of people raised their hands. Almost everyone in the room raised their hands, which is completely different from four months ago."


Dario Amodei: "When I think about what excites me, I think of places where it seems like we've already reached a consensus but in fact we're about to break that consensus, one of which is interpretability. I believe interpretability is not only key to guiding and ensuring the safety of AI systems, but it also contains profound insights into questions of intelligent optimization and how the human brain works; I've said that in the future Chris Olah will win the Nobel Prize in Medicine."


Because I used to be a neuroscientist, and many of the mental health problems we haven't solved yet, like schizophrenia or mood disorders, I suspect are related to some kind of higher-order systemic issue. However, due to the complexity of the human brain and its difficult-to-study nature, these issues are challenging to fully grasp. Neural networks, while not a perfect analogy, are not as inscrutable and uninteractive as the human brain, and over time, neural networks will become a better analogical tool.


Another related area is the application of AI in biology. Biology is an extremely complex problem, and for various reasons, people remain skeptical about it, but I think the consensus of this skepticism is starting to erode. We've seen AlphaFold receive a Nobel Prize in the field of chemistry, which is a remarkable achievement, and we should strive to develop tools that will help us create hundreds of 'AlphaFold's."


Lastly, using AI to enhance democracy. We worry that if AI is developed incorrectly, it could become a tool for authoritarianism. So, how do we make AI a tool for promoting freedom and self-determination? I believe that the development in this area may be a bit earlier than the first two areas, but its importance is no less than the other two.


Jared Kaplan: "I think there are at least two points that echo your earlier points. One is that I think many people join Anthropic because they have a great curiosity about AI science. As AI technology advances, they gradually recognize that we need not only to drive technological development but also to deepen our understanding of it and ensure its safety. I find it exciting to work with more and more people who share a common vision of AI development and responsibility, and I think many technological advances in the past year have indeed fostered the formation of this consensus."


Another aspect is, returning to practical issues, I think we've done a lot of work in AI safety. But with some recent developments, we're starting to get some initial understanding of the risks that very advanced systems might pose. This allows us to directly study and investigate these risks through interpretability research and other types of security mechanisms."


Through this approach, we will be able to gain a clearer understanding of the risks that advanced AI systems may pose, allowing us to advance our mission in a more scientific and evidence-based manner. Therefore, I am very excited about the next six months as we will leverage our understanding of potential issues with advanced systems to further research and find ways to avoid these pitfalls.


Original Video Link