Does DeepSeek have negative implications for computing power?

Wallstreetcn
2025.01.26 00:55
portai
I'm PortAI, I can summarize articles.

The computing power issues of DeepSeek have sparked discussions, with analysis indicating that the training cost of $5.5 million is merely a fraction of DeepSeek-V3's total, and does not include the initial research costs. The demand for computing power decreases with advancements in algorithms and technological evolution, allowing latecomers to avoid waste. Although training efficiency has improved, overall costs may not necessarily decrease, and could instead lead to higher computing power demands. The success of Huanfang represents a victory for open source over closed source, promoting community prosperity

In fact, the specific logic has been analyzed many times from December's DeepSeek V3 to this week's R1, so let's summarize and organize it.

  1. The widely cited $5.5 million overseas refers to the training cost of V3, not R1, and $5.5 million is just a fraction of V3's actual training cost. The original wording from the V3 paper states: The above cost only includes the formal training of DeepSeek-V3 and does not include the costs related to preliminary research, ablation experiments associated with architecture, algorithms, and data. An algorithm engineer in the community once said, “V3 used the R1 model from Huansquare to generate data; should this part of the repeated attempts be counted in the cost?” It's the same idea.

  2. The computational power required for cutting-edge exploration and latecomer catch-up is not on the same scale. This is reflected in the exponential decrease in the computational power required to train the same generation of models every N months. The reasons include advancements in algorithms (FP8, mixed MoE), continuous deflation of computational power, and methods like distillation that condense data. The key point is that exploration implies waste, while latecomers can avoid waste by “standing on the shoulders of giants.” For example, the training cost of O1 is certainly much higher than that of GPT-4, and the training cost of Huansquare R1 is also likely to exceed that of V3. From O3 to O4/O5 and from R1 to R2/R3, the training computational power will only increase.

  3. A reduction in cost for a single training session does not mean the overall training cost will decrease. Will increased training efficiency lead to reduced investment in the lab? No, the real logic is: based on higher efficiency, squeeze out computational power to capture greater benefits. Take Huansquare as an example; despite having strong capabilities in infrastructure optimization and having stockpiled cards in advance without significantly expanding API services and focusing on research and training, they are still short on cards. In contrast, some labs in North America that spent more money appear quite awkward... but will they reduce costs and increase efficiency afterward? No. Digesting and absorbing Huansquare's open-source methods + having much more computational power than Huansquare = capturing a greater enhancement in intelligence. The biggest concern for training computational power should be hitting a wall; improving computational power usage efficiency may actually raise the ceiling of the model itself.

  4. Huansquare represents a victory for the entire open-source community over relatively closed-source models. Contributions to the community will quickly translate into the prosperity of the entire open-source community. If we really talk about losers, it might be the closed-source models. China has already experienced this in advance, being dominated by Llama, with closed-source model companies in China that couldn't compete with Llama3 being forced to shut down, pivot to applications, or turn to open-source. Today, Chinese open-source has reached North American closed-source... If it still cannot compete with R1 (and the upcoming R2 and R3), then the API value of that company is basically zero. But to be honest, this process will indeed quickly reduce the number of participants in model training.

  5. Most importantly, the above discussions are about training, while the future clearly indicates that greater demand will come from inference. One point that everyone seems to overlook is that the reduction in inference costs by Huansquare is even more shocking than that of training. Today everyone sawAMD announced support for Huansquare v3**. In the words of our guest Y, the elegance of the DeepSeek architecture lies in the fact that, compared to the standard transformer architecture, it does not introduce special operators. Theoretically, it can relatively easily support various types of cards... (this was also forced out by the GPU embargo) Everyone should appreciate the weight of this statement and its implications for CUDA... The people at Huansquare are all geniuses at hand-coding operators...

Does the reduction in inference costs favor or hinder computing power? It's easier to understand than training. Please compare: the newly launched o1, which was so expensive that no one used it, and the Doubao after the API price war began. The reduction in inference costs will likely lead to a boom in applications, which in turn will drive greater demand for computing power.

Here, I would like to quote Y's comment from Planet Y, which looks very forward-looking in retrospect: DeepSeek-V3 will support private deployment and autonomous fine-tuning, providing far greater development space for downstream applications than the closed-source model era. In the next year or two, we will likely witness a richer array of inference chip products and a more prosperous LLM application ecosystem.

  1. How to balance North America's ongoing infrastructure frenzy and past wasted investments? The US indeed CSPs are still frantically competing for electricity, extending to 2030. In fact, the major CSPs have spent hundreds of billions of dollars over the past two years, and none of them are purely for training; they are basically driven by their own business needs + growth in inference business. Only Microsoft prepared computing power credits for OpenAI, and AWS rented computing power to downstream customers for training, while Meta/xAI allocated some computing power for their own training, but the bulk of the computing power is driven by their own recommendation system business/automated driving business needs. Moreover, Microsoft has effectively rejected Sam Altman's continued all-in request, instead focusing on more certain returns from inference (Satya said this himself).

Therefore, from an objective perspective, Huansquare's situation means that some past training investments by North American CSPs have indeed gone to waste. It was a necessary cost for risk-taking and exploring new markets. But looking to the future, the overall prosperity of open source will ultimately benefit these "intermediaries." As we have previously explained, they are not the miners taking risks themselves; they are merely the shovel movers, establishing more commercially valuable application ecosystems based on these models (whether open source or closed source). Cards are not only used for training; an increasingly large proportion will shift to inference. If the efficiency of training allows models to progress faster and the application ecosystem to flourish, how could they not continue to invest?

Finally, let me continue to quote from "The Bitter Lesson": In the long run, computing power is the true decisive factor. Historical experience repeatedly tells us that AI researchers often try to instill human knowledge into AI algorithms, which is usually effective in the short term and brings personal achievement and vanity. But in the long run, it creates bottlenecks and even hinders further development. Ultimately, breakthrough progress often comes from a radically different approach, namely expanding computing power scale through search and learning And those ultimate successes are often accompanied by bitterness, difficult to swallow, because the success of computing power is a resounding slap in the face to our human-centered inherent thinking and vanity.

Author of this article: Information Equality, Source: Information Equality, Original title: "Does DeepSeek Have a Bearish Impact on Computing Power?"

Risk Warning and Disclaimer

The market has risks, and investment requires caution. This article does not constitute personal investment advice and does not take into account the specific investment goals, financial situation, or needs of individual users. Users should consider whether any opinions, views, or conclusions in this article are suitable for their specific circumstances. Investing based on this is at your own risk