
At OpenAI, "innovation has become difficult"! Former executives leak insider information

In the Core Memory podcast, former OpenAI Vice President of Research Jerry Tworek shared the reasons for his departure, pointing out that OpenAI is facing an innovation dilemma. Due to intensified competition and organizational expansion, it is difficult to undertake high-risk research. He mentioned that Google's rise is a result of OpenAI's failure to seize its leading advantage, and the talent war within the industry is fierce, making it challenging for researchers to explore new directions outside mainstream machine learning. Tworek emphasized that the core of innovation lies in whether the company can create an environment that allows exploration, rather than relying solely on star researchers
On January 23, Zhidongxi reported that yesterday, the Core Memory podcast hosted by renowned media personality and writer Ashlee Vance released an in-depth interview with Jerry Tworek, former Vice President of Research at OpenAI. Tworek, who decided to leave OpenAI in early January this year, shared a key insight: As competition intensifies and the organization rapidly expands, OpenAI is gradually falling into a structural dilemma where it can no longer afford to undertake truly high-risk research, and some cutting-edge innovative research directions have become difficult to advance within OpenAI.
Before diving into the interview content, it is necessary to understand Tworek's legendary background. Tworek is a founding member of OpenAI, having joined the company in 2019. He is a key figure behind OpenAI's reasoning models o1 and o3, pushing reinforcement learning to its limits and bringing reinforcement learning and reasoning models into the mainstream. Additionally, Tworek has made significant contributions in programming and the Agent field.

On January 7 of this year, Tworek shared his departure news on the X platform, where many OpenAI leaders expressed their reluctance in the comments.

The interview lasted 70 minutes, with nearly 20,000 words transcribed. Zhidongxi summarized eight key insights shared by Tworek:
1. OpenAI's innovation dilemma: Multiple factors such as costs and growth pressures have affected OpenAI's "appetite" for risk, and the company has yet to find a good cross-team research collaboration model.
2. The rise of Google: Rather than saying Google is "returning," it is more accurate to say that OpenAI made mistakes and failed to fully capitalize on its leading advantage. OpenAI should have maintained its lead.
3. Industry ailments: The paths of the five leading AI companies have completely converged, and researchers looking to do something different outside the mainstream machine learning paradigm find it nearly impossible to find suitable places, which is frustrating.
4. Talent war: The talent war has turned into a soap opera, with some people frequently changing jobs while the actual time spent on work is minimal.
5. Innovation engine: Star AI researchers are not the core drivers of innovation; whether the company can create an environment that fosters personal responsibility, allows exploration, and enables significant achievements may be more critical 6. What Hinders Innovation: The factors hindering AI Lab research are not a shortage of computing power, but a lack of focus. For OpenAI, "concentrating efforts on major tasks" has become somewhat difficult.
7. AGI Timeline: Currently, AGI still lacks key pieces, with architectural innovation and continuous learning being two major directions, but AGI is expected to be achieved around 2029.
8. The Return of Reinforcement Learning: History has repeatedly proven that good ideas often come back. It is not difficult to judge whether an idea is important; the challenge lies in determining when it will become important.
Here is the complete translation of the interview content:
Intense Competition, Organizational Expansion: OpenAI's Innovation Dilemma
Host: Your resignation statement was well-written and full of emotion. You experienced a very important period at OpenAI and witnessed tremendous changes. How does that feel?
Jerry Tworek: Every year at OpenAI has been a completely different company. The rapid growth of the company itself, along with the changes in the entire AI world.
I feel this experience is rare in human history. I am grateful to have lived through it all. As I mentioned before, each stage has been completely different.
Host: OpenAI had about 30 people in 2019? Now it must be several thousand, right?
Jerry Tworek: To be honest, it's hard to keep track. Offices around the world, spread across various locations. It's almost impossible to find someone who hasn't heard of OpenAI. When I first joined, there were just a few small teams, each working on their own research projects.
But one thing has remained unchanged—OpenAI's ambition. From the very beginning, it aimed for AGI, wanting to truly change the world and bring about a positive impact. And through ChatGPT, it has genuinely distributed intelligence and practicality to users worldwide, which I think is an incredible achievement.
Host: So after you posted that tweet, did all the foundational model labs around the world come to you?
Jerry Tworek: Indeed, many did. I am also thinking about what to do next. After so many years in this industry, I have met a lot of people. I am not in a hurry to make a decision.
I have been working intensively for many years, and I haven't had much time to really chat with people. Now is a good time to slow down and think about how I want to spend the next seven years. But indeed, I am communicating with many people.
Host: You mentioned in your tweet that you want to do some research that OpenAI cannot do. Can you elaborate on that?
Jerry Tworek: Currently, the competition for the "best AI model" globally is exceptionally intense and harsh. To remain competitive, companies face significant challenges on multiple operational levels One core issue lies in the willingness to take risks: from the perspective of avoiding falling behind, companies are naturally forced to consider how much risk they are willing to take. Whether it is user growth metrics or the ongoing high costs of GPUs, the reality is extremely harsh.
For this reason, continuously demonstrating strength and consistently launching the strongest models has become crucial for everyone. This is the situation that almost all major AI companies currently face, and this pressure will undoubtedly affect an institution's "appetite" for risk.
Another set of equally difficult-to-weigh factors comes from organizational structure. Companies have their organizational charts, and the organizational structure often largely determines what kind of research you can conduct: each team needs a clear identity, research boundaries, and a set of problems it focuses on solving.
Cross-organizational research is often exceptionally difficult, and how to efficiently organize research on a large scale may not have been truly resolved yet.
Research itself prefers vitality, and one could even say it prefers a certain degree of chaos; however, large organizations require order, structure, and clear division of labor. This is precisely why the saying "what you ultimately deliver is your organizational chart" is so popular: research work often evolves into projects that best fit the existing personnel configuration.
It is in this context that I realized there are some research directions I truly want to pursue that are not supported by OpenAI's current organizational structure.
Transformers are definitely not the final form; many paths have yet to be systematically practiced
Host: I once discussed this issue with Mark Chen (OpenAI's Chief Research Officer) on a podcast—almost everyone presents their ideas to him (and Jakub, OpenAI's Chief Scientist). OpenAI indeed has a fine tradition: a willingness to take risks and to do things that other labs dare not do.
But the reality is that no matter how many smart people are gathered and how considerable the resources are, it is ultimately a resource-limited company. It must make significant trade-offs: which directions are worth investing in, and which cannot bear the costs at this time.
And the truly novel paths are often precisely those hesitant directions—we do not know whether we should pursue them now, nor do we know if the budget can afford it.
Jerry Tworek: Regarding the concept of the "research era" proposed by Ilya, I am not sure if it is as binary as he describes, but I am certain that there are still many possibilities in the AI and machine learning fields that have not been fully explored.
Six years ago, we chose the Transformer architecture, and since then, people have been continuously scaling it with significant results. The path is very clear: train larger models each quarter, using more computational resources and data, and progress seems to have never truly stagnated.
But the question is: is this all there is? Is this the final form? I am quite sure it is not. Models can still be improved in various ways, many of which have yet to be systematically practiced. As you mentioned, I have invested a lot of work in reasoning and reinforcement learning expansion. Before that, the entire field had almost bet everything on the scaling of Transformer pre-training.
This approach has indeed been effective: each pre-training can create stronger models, with overall improvements in their capabilities and corresponding enhancements in various evaluation metrics. Therefore, it is easy for people to conclude that as long as pre-training continues to scale, the models will keep getting better.
However, later on, some researchers began to believe that there is more we can do. They attempted to prove that if we expand reinforcement learning with a computational scale comparable to that of pre-training on top of language models, we can teach the models some capabilities that could never be obtained through pre-training alone.
It is precisely because of this exploration that we have the intelligent systems today that can automate complex tasks and significantly reduce computational and data requirements. Once new expansion paths are discovered, new capabilities can be unlocked, whereas if we only follow the scaling laws of pre-training, these capabilities may take an extremely long time to emerge.
In my view, since the release of GPT-4, reasoning models represent a truly significant leap in capabilities. And I firmly believe that breakthroughs like this are not isolated cases. Researchers should not be satisfied with incremental improvements but should continuously think about how to fundamentally change the rules of the game.
Convergence of Leading AI Players: A Regrettable Situation
Host: Last year at NeurIPS, Ilya mentioned, "We are running out of data," implying that pre-training will eventually hit a bottleneck.
Jerry Tworek: I don't think this means that pre-training is coming to an end; it is still improving and there is still much room for optimization.** But pre-training is not the only way to enhance model capabilities, and in many cases, its improvements are very slow. Other methods might be able to drive capability leaps more quickly.
Host: There has long been an interesting phenomenon in Silicon Valley: tech companies often propose ideas that seem strange or even heretical to outsiders, and it is these ideas that give rise to truly disruptive innovations.
But once a certain path is proven successful, the situation quickly reverses, forming a strong consensus, and everyone starts competing in the same direction.
This is exactly the stage we are currently in. The model competition has been ongoing for two to three years, and almost all major labs are doing the same thing. Do you think this is a problem?
Jerry Tworek: I find this quite, quite regrettable; almost all companies are doing the same thing as OpenAI. OpenAI has undoubtedly achieved great success, done more things right, led the paradigm of scaling Transformers, and demonstrated that large-scale models can create real and widespread value for the world.
But today, how many companies are doing almost the exact same thing? Competition certainly has its value, but we currently have about five serious AI companies using almost the same technical formula to build slightly different products on the same technological foundation Perhaps this is the right path, but I still hope to see more diversity and real differences between models, rather than just minor tweaks.
If you observe the top models today, very few can truly distinguish the differences between them. Perhaps we should conduct more blind tests: let users converse with different models and see if they can discern the differences.
I suspect 99.9% of users cannot. These models are extremely similar in experience, even if they come from different teams and employ slightly different technical choices. In such an environment, where is the real exploration? Where is the true innovation and the ability to differentiate from others?
Separating from OpenAI's Substantial Disagreements is Healthier than Forced Collaboration
Host: I have a somewhat pointed question: you are regarded as a legendary figure both inside and outside OpenAI, and the projects you are involved in have a very high success rate. If someone like you feels that what they truly want to do is difficult to advance within the company—regardless of whether the company explicitly opposes it, this resistance already exists.
Is this a warning sign for a company that started as a research lab?
Jerry Tworek: My view is that sometimes people grow to a point where they need to part ways with the past. It is extremely important for a company and its members to reach a consensus on goals and direction.
At some point, I realized that my views on the future research direction diverged from the direction chosen by OpenAI on a substantive level. In this case, separating may be healthier than forced collaboration.
That’s why I also believe that if different companies can truly focus on different things, the industry will become better. Focus is crucial for a company, and OpenAI is likely doing all the right things.
Perhaps I just harbored some unrealistic dreams. I am a relatively optimistic person, and I believe there are always many different things to do in the world, which is entirely possible in principle.
The key is focus, to do the truly core things to perfection. In fact, many things and many companies can only survive and enter the next stage by doing this.
In an ideal world, there should be a multitude of companies doing different things. Especially for researchers, it is difficult for them to invest long-term in a research direction they do not truly believe in. They should be able to find a place where they can engage in the research they are most passionate about and let time test its value.
That’s why I feel somewhat sad that almost all companies today are doing the same thing. The reality is that if you want to do something different outside the mainstream machine learning paradigm, there is almost no suitable place to do so. This may be the most frustrating point for me at the moment.
Host: When you start to seriously think about "what to do next," this issue of homogenization becomes particularly evident. If all labs are doing the same thing, you naturally won't feel that simply switching to another large lab will provide you with truly different space Jerry Tworek: I am indeed thinking about the next stage of my life, but it would make me happier and easier to make decisions if there were more choices in the world that allowed people to slightly deviate from the mainstream and do things that are not so popular but may be equally important.
Host: This raises a question: What do we really need to truly deviate from the mainstream?
A company that has invested so much money and resources and is in the spotlight will instinctively fear taking risks. But the problem is that these risks may be precisely what is necessary. So, what exactly needs to change? Will this situation change in the future?
Jerry Tworek: Interestingly, I personally really enjoy taking risks, and others often describe me that way. Taking risks is inherently a good thing.
However, when risks are tied to huge amounts of capital, those who are willing and able to take such risks become extremely rare. Risk tolerance is a highly personalized and extremely unique trait. I have worked with many people and deeply understand this.
I genuinely believe that people should be more willing to take risks and try more different things. Especially for researchers—nowadays, the salary levels in the AI field are quite exaggerated, which may bring about a side effect: people become unwilling to lose their jobs and reluctant to experience poor performance cycles. As a result, they tend to chase short-term returns.
Many researchers are very smart and have great ideas, but the entire system's incentive mechanism is too shortsighted. Yet it is precisely researchers who should be encouraged to take risks and make bold attempts—because true progress is generated this way.
Computing Power Threshold Hinders Innovation: The Trade-off Between "Exploration and Exploitation" is the Key Issue
Host: Of course, we have also seen some examples. For instance, gaming pioneer John Carmack went to the "cave" in Dallas, where he worked almost alone for a time, and now there may only be a handful of employees. Carmack once said, "Maybe I can't create something truly different, but at least someone should be seriously trying a completely different path."
I have also talked to Ilya, but I am not clear on what exactly he is researching. So I cannot judge whether his work continues the past direction or is some kind of more radical attempt. But it is certain that if he did not think it was a different path, he would not have raised so much money to do it. Yang Likun clearly has ideas that differ from the mainstream.
This is precisely what makes me find this field very interesting. AI is, in a sense, a very old field that can be traced back decades; however, the current mainstream paradigm is relatively new. When I communicate with researchers, they still say, "As long as you read the main papers, you can quickly catch up."
But I often wonder, will there be someone who suddenly comes up with an extremely radical, entirely new idea that completely pushes the entire field forward? Nowadays, this seems to have become more difficult because you may need a data center the size of a state to support the experiments Jerry Tworek: This is a huge resource barrier, and it indeed makes the problem more complicated. But it is also a problem worth serious consideration and attempts to solve.
There is a vast amount of academic research happening around the world, with many students exploring various avenues, but the vast majority of them severely lack resources. As a result, many potentially promising studies ultimately come to nothing because truly critical research often requires large-scale experiments.
For this reason, I am very encouraged by a current trend: a considerable amount of funding is indeed starting to flow towards attempts that support novel and radical ideas. People like Carmack, Ilya, and Yang Likun are exactly the kind of individuals that should exist and be funded at this time.
Clearly, not all attempts will succeed, but some certainly will—innovation in the world happens in this way.
In the field of reinforcement learning, the trade-off between "exploration and exploitation" has long been a classic concept. Even when we optimize agents, we always face this question: should we choose strategies that have been proven effective and have clear successful paths, or should we try new methods to solve old problems in different ways?
This is a difficult but unavoidable trade-off. When we think about how agents should act, we might also reflect on how we make our own choices.
Host: As for that small circle of top AI researchers, do people really understand what Carmack is doing?
Jerry Tworek: To be honest, I am not entirely sure. My impression is that he is heavily betting on end-to-end reinforcement learning in video games using a mouse and keyboard. If I remember correctly, that's roughly it.
And that is precisely what I find very interesting. For a long time, I have believed that video games are one of the best environments for training intelligence. Games are designed for the human brain; to make them interesting for humans, they blend stories, power fantasies, puzzles, and problem-solving, and they must continuously maintain freshness without becoming repetitive.
In a sense, video games are a learning environment tailored for human cognition, and problem-solving abilities are exactly what we want agents to possess.
However, we still do not have truly intelligent models that can operate stably in such high-frequency, multimodal environments. This may expose certain architectural limitations. But I still believe that training AI on video games is a very promising endeavor.
The father of reinforcement learning, Richard Sutton, has done a lot of related work in the past, not only in video games but also in complex games like poker. I have been to his lab. Of course, the gaming environments he worked with back then were much simpler than what we later had the models play Dota at OpenAI. DeepMind CEO Demis Hassabis has also been insisting on similar ideas.
Good ideas often come back around
Host: Interestingly, these ideas were once considered "outdated." In the era of ChatGPT, they do not seem to be the mainstream direction Jerry Tworek: The history of science repeatedly tells us that good ideas often come back around. It's not difficult to judge whether an idea is important; the challenge lies in determining when it will become important.
Seven years ago, when I first joined OpenAI, game-based reinforcement learning was the absolute hot topic. We solved Dota and StarCraft. At that time, DeepMind's AlphaGo was a milestone.
However, these models have a very obvious problem: they have almost no world knowledge. They learn how to play a specific game from scratch without truly understanding our world.
Clearly, this is not the right path. Models first need to form a high-level understanding of the real world, not just react to pixels. Reinforcement learning from scratch is more like a "lizard brain" or "monkey brain" way of learning. What we really want is for models to have more abstract conceptual structures.
After years of large-scale pre-training, we have finally obtained an extremely rich and robust representation of the world. Now, it is time to reintroduce reinforcement learning based on this foundation. The real magic of reasoning models lies in the fact that they build a hierarchy of capabilities on top of a powerful world representation through reinforcement learning. This is the direction of the future.
Host: As for world models, Google has done related explorations, and the research by Yang Likun and Li Fei-Fei also points in this direction to some extent. We, as infants, do not live in a black box; rather, we understand the world through constant exploration. Therefore, combining world models with reinforcement learning seems very reasonable to me.
Jerry Tworek: This idea is clearly correct. The truly interesting part is how we can combine the representation building of world models with reinforcement learning. Reinforcement learning is used to teach models various skills, and these skills are essential for the models to operate in the real world—it gives models the ability to achieve their own goals.
However, to achieve goals, models must first understand the world they are in; only with this understanding can they form effective plans and strategies. This is precisely why world models and reinforcement learning must develop in tandem. Once someone successfully applies reinforcement learning on a well-trained world model, it will be an extremely exciting and milestone moment.
Architectural Innovation and Continuous Learning are Two Major Directions Where AGI Still Lacks Key Pieces
Host: What are you most interested in right now?
Jerry Tworek: Overall, I think simply repeating what has already been done in the lab is not very meaningful. There are still many adjustments and improvements that can be made within the existing paradigms and setups, but there are two directions that I feel are either significantly underestimated or at least have not received enough resources to truly advance.
The first direction is innovation at the architectural level. We have somewhat become too complacent with the Transformer architecture. It is undoubtedly a great architecture and has been explored extremely deeply As people make local improvements to the Transformer, attempting to enhance it through some minor structural adjustments, they indeed encounter many difficulties. Of course, there have also been some quite successful attempts—such as sparsity, which is evidently very successful, and various methods to reduce the computational cost of the attention mechanism have also achieved good results.
But the question is: Will the Transformer be the ultimate architecture for machine learning? Clearly not. Although the creators of the Transformer have done an outstanding job, almost defining the development landscape of machine learning for the next decade, the story is far from over.
There must be other methods for training large models—they may look somewhat like Transformers or may not resemble them at all. This is a question worth investing energy to explore. If no one is going to do this, I would be more than happy to try it myself.
The second direction is a hotter topic, but I don't think anyone has really done it well yet, which is continual learning, and how to truly and thoroughly integrate test time with train time.
For humans, this approach is the most natural: we do not have a clearly separated "learning mode" and "answering mode"; everything happens continuously and simultaneously. Our models should also operate more closely to this way.
This is likely one of the key capabilities we are still missing before achieving AGI. If models cannot continuously learn from the data they encounter, then no matter how powerful they are in other aspects, they will still give a sense of being limited, even somewhat "dull."
Host: Speaking of AGI, when we chatted last time, I mentioned that compared to a year or two ago, I don't hear discussions about timelines as often now. Even discussions about AGI itself seem to have decreased. So I'm actually quite curious.
You describe yourself as cautiously optimistic about AI. So where do you think we currently stand on the AGI timeline?
Jerry Tworek: Yes, my personal view has actually been updated a bit. I have always believed that scaling up reinforcement learning is a necessary component for achieving AGI. About a year or a year and a half ago, I was almost convinced that as long as we scaled up the reinforcement learning of the model, it would become AGI.
Now, I have to slightly revise that view. However, some things can only be seen clearly once you truly enter the next stage. We must also acknowledge that today's models are already performing quite well in many, many aspects.
What they can do in programming is particularly astonishing to me—because writing code is one of my favorite things to do. You can now accomplish a large amount of work very, very quickly.
For some people from ten years ago, if you showed them the capabilities we have today, they might have already called it AGI. So, discussing AGI is no longer as absurd or crazy as it used to be But at least according to my own definition, the current model still cannot be considered AGI, as continuous learning has not been integrated with our model in any substantial way. At the same time, from the current state of the model, there are still significant deficiencies even in capabilities like multimodal perception.
If the model cannot see the external world, or cannot watch videos and understand them well, can we really call them AGI even if they excel in text understanding and programming?
Therefore, to truly achieve the milestone of building AGI, there are many issues that I refer to as "necessary steps" that need to be resolved.
For a while, I thought that if we really work very hard, if everything is done very well, perhaps 2026 will at least become a year in which we make breakthroughs in truly excellent continuous learning and genuinely universal reinforcement learning.
My timeline judgment is still somewhat fluid. But at the same time, the pace of development in the AI field is indeed very fast. Investment is continuously growing every year, and more and more people are entering the AI field, which expands the talent pool and increases the number of ideas we can explore.
So I don't think this idea is completely absurd or unrealistic. It might happen a bit earlier, or it might be a bit later—perhaps in 2026, or maybe 2027, 2028, or even 2029. I don't think it will be much later than that.
Of course, there is still a lot of work to be done, but indeed many people are working hard to achieve AGI.
We are in a transformative era; it is necessary to remain concerned and cautious.
Host: If my memory serves me right, before the "Strawberry" project emerged, were you researching the Q* project? There was a lot of buzz at that time, and everyone was talking about how Ilya saw Q* and knew that AGI had arrived, which scared everyone.
What I mean is, hearing you say that just now makes it seem a bit funny. Because this is indeed a very tricky thing: these systems can do some extremely impressive things, and so we become extraordinarily excited. Then time passes,
You know, just like the current "Strawberry" project—it is indeed incredible and has almost changed the entire field, but I don't feel like I was "scared" the first time I used it.
Jerry Tworek: I understand what you mean. This is a very interesting part of human psychology, and in some ways, it reflects how we interact with technology.
For me, the effects of scaling up reinforcement learning are still very significant, and over time, we will see more such results. Especially in the programming field, this will impact our lives in many different ways.
Today, the experience of undertaking any large-scale programming project is almost worlds apart from a year ago. We will see these changes in a variety of things. When my team and I, along with many people at OpenAI, first saw effective signs of Q* emerging two years ago, You are sitting in a room, witnessing a new technology of substantial significance. If at that moment you do not feel even a little bit of fear, a little bit of concern, or have any doubts about "what consequences our actions will bring," then I would consider you not responsible enough in your work.
I believe every AI researcher should ask themselves: If what I am doing is entirely new and possesses unprecedented capabilities, what impact will it have on the world? In fact, many researchers do think this way. Sometimes, people indeed may inadvertently move a step or two too quickly.
So far, AI has not caused any real harm to the world. Although issues like "sycophancy" may be debatable, other issues have not, at least to our knowledge.
Even so, I still believe that when releasing any new technology to the world, maintaining concern and caution is a very good and healthy response.
We are in a transformative era, a time when many new things are constantly spreading to the world. They will have many impacts, affecting how people live their lives, how they view themselves and others, influencing interpersonal relationships, international relations, and even impacting GDP and productivity.
Sometimes, a line of code written by someone can trigger a chain reaction that flows through all of this like a waterfall. And the responsibility it carries is quite heavy.
Host: These thoughts indeed make a lot of sense, and I have been repeatedly pondering these issues myself. We have probably discussed some of them sporadically before. However, during that time, as the so-called "OpenAI coup" gradually came to light, I found myself subconsciously trying to put myself in your shoes.
But at such a critical moment, a creation that should be understood seriously has instead become an object of fascination, projection, and contention for people. Doesn't that itself evoke a subtle sense of strangeness?
Meanwhile, I see what you have created being thrust into the spotlight, discussed repeatedly by the media before it has truly been understood, and caught up in a near-soap opera-like dispute. At one point, I didn't even know what words to use to describe this feeling—saying "funny" doesn't seem entirely fitting.
Jerry Tworek: It is difficult to separate the tech world, the conceptual world, human emotions, human lives, and the commonalities and differences between humans. We live in a world where there exists an extremely complex network of relationships among key participants in the AI field, spanning multiple levels.
To truly untangle all of this, historians will likely need many years, even decades, to figure out what exactly happened here and what the real situation is.
To be honest, even I only retain very fragmented memories of everything that happened during the "OpenAI coup." Whenever new testimonies emerge or new documents are disclosed, we learn about previously unknown facts. In the future, someone will certainly piece together all the truths, but the world itself is complex Perhaps we do need a healthier way to discuss technology, to find a suitable platform for discussion that can resolve these differences to some extent. But we live in a world without perfect solutions and no perfect way to discuss things.
Differences Are Inevitable, Relying Only on Ideas, Beliefs, and Dreams
Host: You don't think the X platform is an ideal medium, do you?
Jerry Tworek: Personally, I really enjoy posting on X, sharing ideas with the research community and everyone around me, but the X platform is not a completely serious place. So many times, discussions hover between jokes and seriousness.
So, what is the right solution? When one person worries that a technology is too dangerous and advocates for stopping research, while another believes it should continue because it can expand human capabilities. The first person further argues that this isn't even a correct research path, and we should turn to a completely different direction.
In the realm of technological advancement and scientific exploration, such differences are almost inevitable, and everything is shrouded in the unknown. No one really knows where the future will lead. What we can rely on are only ideas, beliefs, and dreams. In this fundamental uncertainty, we still have to continue living, continue choosing, and often have to learn to find common ground while respecting differences on many critical issues.
Host: Yes, considering the media's intense focus on Q*, narratives like "What did Ilya see" indeed had too much hype, and it escalated almost month after month. I am not unaware of this, but I still feel somewhat confused.
I am curious because many of us are very active on Twitter and have participated, amplified, and even driven this discussion and imagination to varying degrees. So, from your perspective, how do you view this ongoing hype? Do you also feel that it might need to cool down a bit? Personally, I think we should significantly cool it down.
Jerry Tworek: But at the same time, if someone had told you seven years ago that OpenAI would become a trillion-dollar company, build the largest data center in history, and have one of the largest network products in the world, with everyone constantly talking about AI, you would have thought they were crazy. It sounds like hype in itself.
I actually believe that, in many ways, there is substantial content behind the hype. Sometimes it goes overboard, and sometimes it doesn't go far enough, but AI is indeed important and needs to be discussed. I think now no one would consider AI an unimportant topic.
The situation was certainly different a few years ago when many people thought AI was unimportant. But it is now clear that AI may be one of the most important topics in the world, worthy of our continued discussion and in-depth thought How fast will the progress be? Which paths are correct? How safe or dangerous is it really? These questions can certainly lead to disagreements and debates, but AI has already deeply integrated into this world and will only become stronger.
Some people frequently change jobs but accomplish very little
Host: I completely agree. But if we temporarily set aside the technology itself, I mean, I've reported on the talent poaching frenzy at Meta. This has turned into a soap opera, a reality show, rather than just a matter of hardcore science. You've been working in this field for so long. I'm just curious, have we crossed the line into the realm of reality shows?
Jerry Tworek: But the question is, who is actually creating this soap opera? It certainly isn't me.
Host: My age is enough for me to have experienced the internet bubble and earlier technology cycles. This time feels much more like a soap opera. Even thinking back to the productivity software wars, it wasn't like this.
A large part of the reason is that the stakes today are simply too enormous. The scale of funding involved, the movement of researchers between various labs, combined with a series of highly dramatized events, has kept the entire situation in a state of tension for a long time.
From the very beginning, I had a strong feeling that San Francisco seemed to have created an independent world for itself. Rather than a bubble, it feels more like we are constantly convincing ourselves that this is the endgame, the stakes are high, and this is a race that could be either extremely exciting or extremely disastrous. Everything is highly tense, which brings additional psychological burdens.
So I do feel that this time is very different. During the internet bubble, everything stemmed from a simple and naive thought: this is so cool, all the information in the world is at our fingertips, and people can connect with each other. Companies emerged later, and the competition for money gradually surfaced. But now it seems that from the very beginning, the weight of the entire world has been pressing down on this matter.
Honestly, I don't know how you all have managed to get through this. I see that whether it's OpenAI, Anthropic, or other labs, everyone is working hard and competing, and the stakes are so high. Being in this state for seven or eight years, anyone would be exhausted. I completely understand why you would want to take a break.
This is not just a physical drain, but also a psychological wear and tear. Because once you truly accept this setup, it will continuously erode you.
Jerry Tworek: Indeed, all of this brings psychological wear and tear. But I can tell you that someone who was much more experienced than I am in dealing with pressure once told me: every time you go through a high-pressure moment, it's like doing a push-up; your ability to withstand pressure increases a little bit.
To be honest, these seven years of work have indeed trained me to have strong psychological and emotional resilience. At least I genuinely feel that I can block out a lot of noise and unnecessary distractions, trying to remain stable and steadfast no matter what happens, whether the company is on the verge of collapse, researchers are frequently moving, or projects are constantly being reassigned There will always be some things happening. I have also heard people compare talent poaching to sports team transfers. The reason sports leagues can operate relatively orderly is that they have clear role divisions and explicit transfer rules, specifying when players can move and when they cannot. Unfortunately, California law has almost no real restrictions in this regard.
I do believe that establishing some rules in this area could be a good thing. Because in this industry, there is indeed a phenomenon where some people frequently change jobs, while the actual time they invest in work seems to be less. This situation is occurring and is not uncommon.
Host: So, how about putting a salary cap on the AI field?
Jerry Tworek: Indeed, some people are frequently job-hopping, while others are still sticking to their work, striving to push the frontier forward. However, AI is undoubtedly already a big business.
Host: Just the other day, I was chatting with colleagues about needing to compile a list of everyone who has worked at leading AI institutions, noting how long they stayed at each place. There must be quite a few who have completed the "Bay Area Grand Slam," having worked at every company.
Revealing the "Polish Mafia" Inside OpenAI: Diligence is an Important Quality
Host: Can we talk about the "Polish Mafia"? When I first started writing this book about OpenAI, around 2018, there were only about thirty people in the entire company. A significant portion of this initial group came from Poland, surprisingly many. They were almost all mathematical geniuses, some had known each other since childhood, while others did not.
However, this does reflect, to some extent, the excellence of the Soviet education system in cultivating mathematical talent, or it could simply be that once one person went to OpenAI, everyone else who knew them followed suit.
Jerry Tworek: Personally, I did not know anyone at OpenAI before I eventually joined; coming to OpenAI was purely a matter of chance.
But in the early stages of OpenAI, the proportion of Polish people was indeed very high. I do not think this trend can be sustained in the long term. Now, the absolute number of Polish employees is greater than in the early days, but considering the company has grown by hundreds of times, this proportion is actually not high.
However, our education system does have something to offer. But I have not personally experienced other education systems, so I cannot truly judge whether the Polish education system is really that outstanding.
Poland indeed has many outstanding talents. And one thing I greatly admire about Poland is that Polish people are very diligent. In fact, over time, especially in many developed countries, hard work seems to be increasingly undervalued. Life has become more comfortable, and people have more other things to focus on and prioritize, which is perfectly normal. But Polish people do place a high value on diligence Before I was born, Poland was a communist country. The very year I was born, the country transitioned to a free market economy. This process was quite brutal, but society embraced the change, striving to become more entrepreneurial, to fight for its own future, and to achieve economic prosperity. And it turned out to be successful.
I am an expatriate and no longer live in Poland. But every time I go back, about once or twice a year, I can clearly see the country continuing to build and develop. I see it becoming better, more beautiful, and more prosperous. It truly is an amazing story.
Host: Are you considered a celebrity there? I always feel like the Polish government might be thinking: Damn, we could have made this happen. We should have kept these people here. I went to Poland last year, and I know they have realized this. Almost everyone asks: Do you know Wojciech (one of the co-founders of OpenAI and one of the few early OpenAI members still working at OpenAI)?
Jerry Tworek: Wojciech is truly an amazing person, very friendly. However, Silicon Valley is also completely unique, with ambition, scale, and vitality that is not easily replicated anywhere else in the world. But I can assure you that Poles are very hardworking and can see through "hype." This really can take you far in life.
Behind Google's Comeback is OpenAI Making Mistakes
Host: Are you surprised by Google's comeback, or resurgence? It seems they have done a lot of things right; did you always think they would eventually get their act together and catch up? Or is this actually a surprise?
Jerry Tworek: Personally, I think it's less about Google's "comeback" and more about OpenAI making some mistakes. Although OpenAI has done many things right, it has made a few mistakes even in ideal conditions, and its execution speed has been slower than it could have been.
If you are a leading company and have all the advantages that OpenAI has, you should always stay ahead. But if you make wrong decisions along the way while others make the right ones, then others will catch up.
Google has indeed done many things right; they have significant structural advantages in hardware, talent, and more. When OpenAI was just starting, Google was clearly the number one in almost all machine learning and research directions.
OpenAI's ability to stand out mainly comes from a steadfast belief in a specific direction and path. The world took an extremely long time to realize that this was a good belief, a good direction.
Even when GPT-2, GPT-3, and GPT-3.5 were being trained, not many people really paid attention. You go to NeurIPS and talk to researchers, and everyone thinks OpenAI is pretty cool, but other labs often say: Well, we can replicate it sooner or later Those large language models are quite interesting, but that's about it.
Only when OpenAI started to truly make money through ChatGPT did other companies suddenly realize: oh, this thing can now be profitable, we really need to do this.
This gave OpenAI an extremely long time window, from building technology to achieving commercialization, while others only later realized "we really, really need to do this." Google also only started to take large language model training seriously from that point on.
And because OpenAI has not fully capitalized on its leading advantage, Google is now very, very close in terms of model capability and training. This is good news for Google, and I would like to congratulate them for turning the situation around and executing exceptionally well.
Host: What mistakes were made? I remember when I reported on your launch of the search feature, the outside narrative was: OpenAI launches search, and Google is done for. I thought at the time, I'm not sure it would be like that. So, what were the specific mistakes?
Jerry Tworek: I don't want to delve too deeply into the details of internal decision-making, which were right and which were wrong. But I want to emphasize again: under ideal execution circumstances, if you start ahead, you should maintain that lead.
OpenAI Needs to Accelerate Progress, Anthropic is Admirable
Host: It seems you believe OpenAI has made some technical mistakes, while some dramatic events within the company have slowed progress at certain stages. I've talked to enough insiders at OpenAI, and they have been thinking about how the company should move forward. Then at some point, a group of key figures left. But it sounds like you were talking more about technical issues earlier.
Jerry Tworek: These things are sometimes related. Technically, I don't think personnel turnover itself is a serious issue. In any company, comings and goings should be normal. But sometimes, personnel departures are indeed a sign of deeper issues.
However, if someone in the company says, "Someone is doing the wrong thing, we no longer believe in this company, we should leave," that may indeed indicate deeper problems. But as I said before, some things could clearly progress faster.
Host: As you mentioned, major labs are doing similar things in overall direction. So Meta is somewhat of a latecomer. Although they have long been involved in AI, it seems now they want to do it differently while poaching talent from other companies.
I'm not quite sure what Meta is specifically doing, but it feels to me like they are not trying to carve out a truly different path, but rather want to take the same path as others. This seems to me to be a fundamental issue. You arrived a bit late but are doing the same thing as everyone else, and the outcome may not be very good. Do you think they really have a different approach? Jerry Tworek: I'm not particularly familiar with their strategy, so I can't be certain. But from an external perspective, I feel they have realized something: in the current AI world, you can think about what you want to do in two ways.
One is that we want to create a model that is clearly superior to others in certain aspects; the other is that I want to create a model that is equally excellent as others but used in a different way or built around different products.
From my understanding of Meta, this company focuses on connecting people, building relationships, and creating experiences, whether in the metaverse, social networks, or other forms of experiences. I want to emphasize again that this is just my speculation, but I believe their thinking is to leverage AI technologies and Transformers that the industry has already understood and mastered to try to build these experiences.
From the perspective of a highly profitable company with the largest social network in the world, this could be quite a good strategy.
Host: We just talked about Google's return. In the ongoing competition between OpenAI and other companies, is there any AI Lab that has left a particularly deep impression?
Jerry Tworek: I have to say, this is a change that has only recently occurred, but over the past year, my admiration for Anthropic has indeed increased significantly. I've never been particularly focused on the "personality" of models. Although I've heard that Claude has a good personality, maybe.
But what they have done in programming models and programming agents, the brand they have built around these achievements, and the large number of developers they have, are absolutely astonishing accomplishments.
Anthropic started later, has limited computing resources, and a smaller team, facing many difficulties in acquiring quality computing power and hardware, yet they have still successfully built excellent products. These products are changing the way people develop software and, as far as I know, significantly enhancing corporate productivity. Congratulations to them.
Host: They seem to be at a high point. Everyone I know is talking about Claude Code, but I really don't know how they made such an outstanding Claude Code that is as widely loved as ChatGPT. It seems that many labs are indeed drawing on this tool, while some labs have been cut off.
Jerry Tworek: Yes. At OpenAI, we are also developing Codex, which is our own programming tool, and it's pretty good. Interestingly, I actually haven't used Claude Code much myself. After all, I was employed by OpenAI at the time, so I didn't use it much.
So I really can't say for sure. But I think Codex is not a bad product. It's just that, from the sentiment on Twitter, Claude is indeed very popular among developers worldwide
The Lack of Focus in the AI Circle Has Become a Common Problem, Making It Difficult for OpenAI to "Concentrate on Major Tasks"
Host: Based on our previous conversation, you seem to have a strong interest in science at an intellectual level. Your research on reasoning stems from your long-term vision of creating an "AI scientist." When I saw your tweet announcing your departure, I wondered whether you would continue in this competition centered around foundational models or take a different path. I feel you might venture into biotechnology or a similar direction to pursue this goal in a rather different way.
Jerry Tworek: If I could clone myself to do multiple different things, I would really want to do that. But to make a long story short, there are moments when I wake up and realize that I feel quite satisfied and proud of the achievements I've made in my life.
What I really want to do now is to bet on one or two major research directions and do everything I can to make them successful. I believe people should be willing to take risks. I am one of those who is willing to try crazy ideas and has a very high risk tolerance. I feel I should apply this ability to something beneficial.
Host: How long does it take to truly bring the ideas in your mind to fruition? Is it a one-year project? Or is the "high risk" you mentioned something that requires investing four to five years of your life to pursue something that may not be better than existing technologies?
Jerry Tworek: I am absolutely willing to invest a lot of time. At the same time, I believe people should execute quickly; being slow is not a reason to be proud. To execute well on research projects, I hope to get things done as soon as possible.
But the truly important part is still what I mentioned earlier: focus and belief. If you are doing many different things at the same time, it will scatter your attention and resources. Although AI Labs often say they are limited by computational resources, which slows down research, that is indeed one of the important influencing factors. But many times, the more common and widespread issue is actually the lack of focus. After all, the attention you can allocate each day is limited.
I often tell the researchers I collaborate with: reduce the number of experiments, but think more deeply about each one. Because sometimes, even just spending time, like a few hours, without running any programs, but analyzing experimental data more carefully, can lead to breakthroughs more easily than running more experiments.
Host: Institutions like OpenAI, which have a lot of computational resources, are actually just spreading those resources across too many projects. In fact, if those resources were concentrated on fewer projects, the computational power itself would be completely sufficient.
Jerry Tworek: This goes back to the issue of risk-taking and belief. If you are working on three projects at the same time, and one succeeds, the other two may be abandoned. If all three succeed, that would be great, but if you only work on one project, it will progress much faster because you can focus more and have stronger belief Of course, if the project ultimately fails, it would be a big problem, but if it succeeds, it could have the best model in the world.
For OpenAI, it is currently a bit difficult to get the entire company to focus on doing something entirely new and different. It is also very hard to completely disregard whether Gemini will have a better model next quarter.
Such things definitely require a specific type of person; only this kind of talent is willing to take risks. That is the key.
Host: I know you can't talk about those so-called "secret formulas." But I'm still curious, what direction is OpenAI heading? Or at least, from a macro perspective, where are they allocating resources? Recently, the news about OpenAI adding ads to ChatGPT has gone viral online.
Jerry Tworek: I shouldn't and can't talk about any plans of OpenAI.
Host: Do you think any of these model companies will have the courage to join advertising like OpenAI? Perhaps the word "courage" isn't accurate, as not having ads might itself be a bad decision. Is monetizing through ads inevitable?
Jerry Tworek: This is a business strategy issue, and my job is to train models.
What OpenAI is really good at is driving innovation from "1 to 100" through "operational methods."
Host: I'm not trying to put you on the spot; I'm just trying to clarify some thoughts after this complete conversation. When you talk about the new direction you want to pursue, you do need a certain amount of "horsepower." Will you try it yourself, or do you have to be in a place with enough "energy" to conduct the research you want to do?
Jerry Tworek: This is the primary question I am currently trying to understand. Every AI research still requires GPUs and computing power, and I need to consider what the best approach is.
Host: This is Poland's opportunity. They need to give you a national-level data center.
Jerry Tworek: That idea might be good. I am gradually clarifying my own shorthand; I know what types of research I want to pursue and am constantly trying to figure out what the best path to achieve them is.
I've heard more than once that you are much happier after leaving than before. I heard from someone who is now an entrepreneur that working at OpenAI is even more stressful than starting a business, which shocked me. OpenAI is indeed a rather stressful place.
Host: One last question, aside from the fact that everyone is chasing similar things, have you observed any other significant mistakes in the AI field?
Jerry Tworek: I don't think there are any huge mistakes. It's actually hard for everyone to make the same huge mistake. I think there is only one real issue here: how to balance exploration and the continuation of the original technology path? Host: My previous question may not have been phrased well; what I really want to ask is, are there any ideas in the research community that you believe are underestimated and haven't received enough attention from the world?
Jerry Tworek: To be honest, there are many such ideas, but what they need most is just a bit more attention, a bit more computational resources, and a bit more spirit to strive for them.
I think there is something quite unique: many researchers like to do work from 0 to 1. Much academic research is like this, creating entirely new ideas, proving that they are feasible to some extent, and then publishing them.
What I believe my team at OpenAI and I excel at, and what I think we do exceptionally well, is advancing research from 1 to 100, which means adopting those different ideas that we haven't worked on before but have been preliminarily validated, and figuring out how to make them work reliably when training large-scale frontier models, while also integrating many other relevant factors.
This is precisely what a lot of academic research lacks. Concept validation is certainly cool, but training one of the world's most capable models using a specific technique requires a lot of very specific and detailed work. If the method is wrong, it could take years, but if you have the right algorithms and know how to introduce these elements, it might only take a few months. This is exactly what I want to try more in the future.
Host: When we talk about some personnel departures from OpenAI, you mentioned that the company should be able to withstand these losses. However, the AI field seems to be driven to some extent by "stars," like Alec Radford. The poaching of talent is also ongoing.
From the behavior of these labs, it is clear that these companies believe AI is a field driven by research stars. I'm curious about your thoughts on this. You seemed a bit hesitant about this issue earlier. The industry has both the long-term accumulated work of the entire academic community and significant breakthroughs that come from a very few individuals.
Jerry Tworek: This is a rather complex topic, but I think two things can be true at the same time. Many times, as you see at OpenAI, it is indeed a very few individuals who have an extraordinary impact, driving a series of groundbreaking results and spreading them throughout the industry. I have seen this happen time and again.
But at the same time, whenever I see people switching companies, I rarely see it having a truly significant impact on the original company. The characteristics of the company itself, or what you might call a kind of "operating mode," is the real research engine, rather than whether a specific researcher is still here.
I have also observed that researchers who jump between companies often do not perform as efficiently in the new environment. Even if they have often done great work in the past, they may become somewhat distracted after arriving at a new place, need time to adapt to the environment, or temporarily lack particularly fresh ideas Of course, experience in this field can certainly bring some advantages, but more importantly, it is about creating an atmosphere that fosters a strong sense of personal responsibility, allows for exploration, and empowers people to achieve great things.
Moreover, whether it is this group of people or another group, it is entirely possible to form many teams capable of achieving great results. I do not believe that any specific person is irreplaceable. In my view, a good research structure, a good research culture, and good collaboration methods are far more important than whether a specific person is in your team.
Risk Warning and Disclaimer
The market has risks, and investment requires caution. This article does not constitute personal investment advice and does not take into account the specific investment goals, financial situation, or needs of individual users. Users should consider whether any opinions, views, or conclusions in this article are suitable for their specific circumstances. Investing based on this is at your own risk.
