
Existing paths are blocked? OpenAI and Amazon consider changing the training methods for large models

The AI research paradigm may undergo a fundamental shift: experts advocate abandoning the general model of "pre-training followed by fine-tuning" and instead introducing curated data for specific goals (such as coding and customer service) early in the training process. If this approach becomes widespread, AI development will shift from creating "generalist models" to building multiple "specialized models" from scratch, thereby reshaping the industry ecosystem and team structures
As competition in the field of artificial intelligence enters deeper waters, top researchers in the industry are questioning the existing model training paradigms.
Researchers from OpenAI, Thinking Machines Lab, and Amazon are exploring a fundamental shift: abandoning the current standard process of "pre-training followed by fine-tuning" in favor of a training model that introduces selected data relevant to specific tasks earlier, in order to address the inefficiencies and "split-brain problem" of existing models.
This potential shift is strongly advocated by David Luan and others from Amazon. The core argument is that the current general training path—first endowing the model with broad world knowledge (such as poetry or gardening), and then fine-tuning for specific tasks (such as coding or customer refunds)—is not always logically sound. Researchers believe that if the ultimate use of the model is already determined, then highly relevant selected data should be introduced during the pre-training phase to serve the final goal more directly.
If this methodological adjustment is put into practice, it will profoundly change the development landscape of the AI industry. This not only means that development teams may no longer need to artificially segment their work into pre-training and fine-tuning, but it also signals a shift in the market from "one general model fits all scenarios" to "building specialized models based on different datasets." This transition will force developers to conduct stricter data screening early in the training process, thereby determining the model's strengths and weaknesses in specific domains.
Signs of this differentiation are already emerging in the market. OpenAI is currently routing ChatGPT queries to different models via routers and developing specialized models such as GPT-5-Codex. This strategy reflects a significant gap between consumer demand for simple chatbots and the company's pursuit of high-end goals such as superintelligence and scientific research (like Mars colonization or disease treatment). If this approach is further deepened, OpenAI may need to completely reorganize its research team to accommodate entirely different model training needs.
Reshaping Training Logic: Abandoning General Redundancy
The current AI training standards somewhat mimic the human learning process, which involves accumulating a broad base of knowledge during childhood and then learning specific skills. However, there is a growing reflection within the industry on the efficiency of this process. David Luan points out that for a model aimed at handling code or customer service, spending a large amount of computational power learning completely unrelated fields (such as poetry or gardening) is a waste of resources.
This "broad net" approach to pre-training, while intuitive, has also led to technical bottlenecks such as the "split-brain problem," where the model may give incorrect answers simply due to variations in the way questions are asked. The new thinking advocates using the pre-training process to engage with selected data that is more relevant to the established tasks. Researchers from OpenAI and Thinking Machines Lab agree with this perspective, with some even suggesting the elimination of independent teams for different training phases, integrating personnel into a unified training team to enhance specificity
The Rise of Specialized Models and Organizational Restructuring
This transformation will have a profound impact on the ultimate form of AI models. Researchers must decide early in the training process which data to include, which will directly determine the boundaries of the model's capabilities. For example, increasing mathematical and coding data while reducing prose data in early training may create an excellent programming assistant but sacrifice its abilities in creative writing or emotional communication with users.
This will lead to a future AI market that no longer relies on post-training modifications of the same pre-trained model, but rather sees a surge of specialized models trained on different foundational datasets. According to internal information from OpenAI, the company has already recognized this demand differentiation. On one hand, consumers want ChatGPT to answer simple questions and act as a chat partner, while on the other hand, companies are committed to cutting-edge research in reasoning models and superintelligence.
Currently, although all of OpenAI's models are still based on the same pre-trained model, they have addressed this complexity through routing technology and specific versions (such as GPT-5-Codex). If the future shifts towards training completely independent models for different purposes, it will require a thorough restructuring of the research teams.
Hardware Breakthroughs and Capital Bets
While the software training model is brewing a transformation, innovations in the hardware field are also accelerating, with capital closely watching new technologies that can enhance energy efficiency. The photonic chip startup Neurophos has just completed a $110 million Series A funding round led by Gates Frontier, a firm backed by Bill Gates, with participation from Microsoft's venture capital firm M12.
Neurophos is dedicated to designing chips that perform AI mathematical operations using light rather than electrons. According to the company's co-founder and CEO Patrick Bowen, their goal is to deliver a chip by 2028 that is 50 times faster and more efficient than NVIDIA's Blackwell chip. Microsoft executive Marc Tremblay stated that modern AI reasoning has a huge demand for power and computing resources, and the industry needs breakthroughs at the computational level.
At the same time, OpenAI is also strengthening its infrastructure. OpenAI CFO Sarah Friar revealed at the World Economic Forum that the company's custom reasoning chips are in the "tape-out" stage, the final step before manufacturing. She also mentioned that the Stargate infrastructure project, announced last year with a value exceeding $500 billion, is more than halfway completed, and "the progress is beyond imagination," with the company training models on Oracle's Stargate campus servers.
Industry Consolidation and Competitive Dynamics
Mergers and financing activities in the AI sector remain active. According to data from The Information, the software company Lightning AI, which aims to customize AI models, has merged with data center provider Voltage Park, with the new company valued at over $2.5 billion. Additionally, Yelp has agreed to acquire the AI agency startup Hatch for $300 million. Google DeepMind has also hired the CEO and several top engineers from the voice AI startup Hume AI through a licensing agreement In terms of the movements of tech giants, according to Bloomberg, Apple is negotiating with Google to leverage its cloud infrastructure and TPU chips to launch an updated version of Siri, with plans to introduce AI-driven wearable devices as early as 2027. NVIDIA CEO Jensen Huang is reportedly preparing to visit China to try to regain a foothold in this strategic market.
On the regulatory and ethical front, Anthropic has released a new version of the "Constitution" for Claude, which reduces prescriptiveness compared to the initial version from 2023, giving the model more room for judgment and, unusually, mentioning the possibility that the model may possess some form of "awareness" or "moral status." The White House Council of Economic Advisers has released a report predicting that generative AI will trigger a profound transformation of the U.S. economy, with the potential to significantly boost productivity and growth
