In-depth conversation with the founder of SemiAnalysis: In the era of AI, will NVIDIA be challenged?

Wallstreetcn
2024.12.24 09:29
portai
I'm PortAI, I can summarize articles.

Excluding Google, 98% of global AI workloads run on NVIDIA chips; Google and Amazon's chips currently have their own issues and do not pose a challenge in the short term; the data shortage is a false proposition, as data can be synthesized to continue training; there are no issues with AI capital expenditures next year, but there is uncertainty in 2026, which may be a turning point for the industry

How large is NVIDIA's market share? What are the company's competitive advantages? What opportunities do AMD, Google, and Amazon have? Is data scarcity a false proposition? Is there really no problem with industry capital expenditure? Where is the turning point?

Recently, Dylan Patel, founder and chief analyst of Semi Analysis, Bill Gurley, a well-known tech investor in Silicon Valley, and Brad Gerstner engaged in a three-way discussion about the current state of AI chips, how long NVIDIA's competitive advantages can last, whether data scarcity is a false tomorrow, and how long AI capital expenditure can continue.

Here are the key points from the discussion:

Excluding Google, 98% of global AI workloads run on NVIDIA chips; if Google is included, this figure is 70%.

NVIDIA's advantages lie in three areas: the company's software is superior to most semiconductor companies; in hardware, they can adopt new technologies first and move chips from design to deployment at an extremely fast pace; in networking, their acquisition of MELLONOX greatly enhanced their networking capabilities.

Although Google has its own understanding of software and computing elements, it needs to collaborate with other suppliers in challenging areas such as chip packaging design and networking.

With the construction of data centers and the tight supply of electricity, companies need to plan resources more rationally.

Text is currently the most effective data domain, but video data contains more information. Additionally, pre-training is only part of model training; inference time calculation is also important. If data runs out, synthetic data can be created to continue improving the model.

Although the one-time huge benefits of pre-training may have passed, companies can still gain certain benefits by increasing computing resources, especially in a competitive environment. The benefits still exist, but the difficulty of obtaining them has increased.

Synthetic data is most effective in areas where functional validation can be performed.

Wall Street's current estimates of capital expenditure for data centers are generally too low. By tracking global data centers, companies like Microsoft, Meta, and Amazon are spending significantly on data center capacity. This indicates that they believe they can win in competition by scaling up, which is why they continue to invest.

NVIDIA is not Cisco from the year 2000; the valuations of the two are not comparable.

Pre-training may encounter diminishing returns or excessively high costs, but synthetic data generation and inference time calculation are becoming new development directions.

Currently, companies' investments in inference are relatively small. It is expected that in the next 6 months to 1 year, there will be significant improvements in model performance in certain benchmark tests with functional validation.

Currently, GPT-4o is very expensive, but if the model size is reduced, costs will drop significantly.

AMD performs excellently in chip engineering but has significant shortcomings in software. They lack sufficient software developers and have not invested in building GPU clusters to develop software, which contrasts sharply with NVIDIA

The TPU system built by Google in collaboration with Broadcom is competitive in terms of chip interconnection, network architecture, and even surpasses NVIDIA in certain aspects.

The commercial success of Google's TPU is relatively limited, mainly due to its software being not open enough, uncompetitive pricing, and primarily being used for internal services.

Amazon's chips have advantages in HBM memory bandwidth and cost per dollar by reducing costs. Although they fall short of NVIDIA in technical specifications (such as memory, bandwidth, etc.), they are attractive for some cost-sensitive application scenarios.

From a market perspective, hyperscale data centers plan to significantly increase spending next year, which will drive the development of the entire semiconductor ecosystem (including network equipment suppliers, ASIC suppliers, system suppliers, etc.).

There is some uncertainty regarding the situation in 2026. On one hand, whether model performance can continue to improve will be a key factor. If the pace of model performance improvement slows down, it may lead to market adjustments.

The following is the full dialogue, translated by AI.

Host: Dylan, welcome to our show. Today we want to delve into a fundamental change happening in the computer world that has been discussed throughout this year. Bill, why don't you start by introducing Dylan to everyone?

Bill: Sure, we are pleased to have Dylan Patel from SemiAnalysis with us. Dylan has quickly built one of the most respected research teams in the global semiconductor industry. Today, we want to explore the knowledge Dylan has about architecture, chip scaling trends, major players in the global market, supply chains, and connect it to the business issues that our audience cares about. I hope to provide a summary of semiconductor activities related to the AI boom and try to grasp its development trends as a whole.

Dylan: I'm glad to be here. When I was a kid, my Xbox broke. My parents were immigrants, and I grew up in rural Georgia with not much to do, so I tinkered with electronics. I opened up the Xbox, shorted the temperature sensor, and then fixed it. From that moment on, I developed a strong interest in semiconductors, started reading financial reports of semiconductor companies, investing, and delving into technology-related content.

Host: Can you briefly introduce SemiAnalysis to us?

Dylan: We are a semiconductor and AI research company that serves hyperscale data centers, large semiconductor private equity firms, and hedge funds.

We sell relevant data on global data centers, including power usage and construction progress each quarter; track about 1,500 fabs globally (but the actual key ones are about 50); and provide supply chain-related data, such as cables, servers, circuit boards, transformers, etc., along with forecasting and consulting services.

Excluding Google, 98% of AI work globally runs on NVIDIA chips

Bill: Dylan, we all know that NVIDIA dominates the AI chip market, how much of the current global AI workload do you think is running on NVIDIA chips?Dylan: If we don't consider Google, the share exceeds 98%. But if we include Google, it's about 70%. This is because a large portion of Google's AI workload, especially productive workloads, runs on its own chips.

Bill: Are you referring to productive workloads that generate revenue, such as Google Search and other large AI-driven businesses of Google?

Dylan: Exactly. Google's non-large language models (LLM) and other productive workloads run on its internally developed chips.

In fact, Google has been using Transformer technology in search workloads as early as 2018-2019, with BERT being one of the very well-known and popular Transformer models at that time, which has been running in its production search workloads for years.

Three Advantages Combine to Make NVIDIA the Market Leader

Bill: So back to NVIDIA, why is it so dominant in the market?

Dylan: You can think of NVIDIA as a three-headed dragon. Most semiconductor companies perform poorly in software, but NVIDIA is an exception.

In terms of hardware, NVIDIA also outperforms most companies; they are able to adopt new technologies ahead of others and push chips from design to deployment at an extremely fast pace. Additionally, in networking, they acquired MELLONOX, greatly enhancing their networking capabilities. The combination of these three advantages makes it difficult for other semiconductor companies to compete with them individually.

Bill: You previously wrote an article that helped people understand the complexities of NVIDIA's modern cutting-edge deployments, including aspects like racks, memory, networking, and scale. Could you give us a brief overview again?

Dylan: Sure. When we look at GPUs, running an AI workload typically requires multiple chips to work together because the scale of the models far exceeds the capabilities of a single chip.

NVIDIA's NVLink architecture does a great job of networking multiple chips, but interestingly, Google and Broadcom collaborated to build similar system architectures even before NVIDIA, with Google using TPUs to build similar systems back in 2018.

While Google has its own understanding of software and computing elements, it needs to collaborate with other vendors in challenging areas like chip packaging design and networking.

Now, NVIDIA has launched the Blackwell system, which is a rack containing multiple GPUs, weighing three tons and featuring thousands of cables, making it very complex.

Competitors like AMD have also recently entered the system design field through acquisitions, because building a multi-chip system that can work together, cool well, and have reliable networking is a highly challenging problem, and semiconductor companies typically lack the relevant engineers.

Bill: What areas do you think NVIDIA has made incremental differentiated investments in?

Dylan: NVIDIA has made significant investments primarily in the supply chain. They must work closely with the supply chain to develop next-generation technologies and bring them to market firstFor example, in fields such as networking, optics, water cooling, and power transmission, NVIDIA continuously introduces new technologies to maintain its competitive advantage. Their pace is very fast, with many changes every year, such as the launch of products like Blackwell and Rubin. If they stagnate, they will face competitive pressure, as other competitors are also striving to catch up.

Bill: If NVIDIA stagnates, in what areas might they face competition? What conditions do other alternatives in the market need to meet to capture more workload share?

Dylan: For NVIDIA, their main customers have huge spending in AI, and they have enough resources to research how to run models on other hardware, especially in inference.

Although NVIDIA's advantage in inference software is relatively small, their hardware performance is currently the best, which means lower capital costs, operating costs, and higher performance. If NVIDIA stops progressing, their performance advantage will no longer grow, and other competitors will have the opportunity.

For example, now with the launch of Blackwell, NVIDIA is not only 10-15 times faster in inference performance than previous products (optimized for large models), but they have also lowered profit margins to cope with competition, planning to improve performance by more than 5 times each year, which is a very fast pace. At the same time, AI models themselves are continuously improving, and costs are decreasing, which will further stimulate demand.

Bill: You mentioned that the role of software in training and inference is different; can you explain that in detail?

Dylan: Many people simply refer to NVIDIA's software as Kuta, but it actually contains many layers.

In training, users typically rely on NVIDIA's software performance because researchers are constantly trying new methods and do not have much time to optimize performance.

In inference, companies like Microsoft deploy on a limited number of models and update them approximately every six months; they can invest a large number of engineers to optimize the performance of these models on other hardware. For example, Microsoft has deployed GPT-style models on hardware from companies like AMD.

Host: We previously mentioned a chart that shows there will be one trillion dollars in new AI workloads over the next four years, as well as one trillion dollars in data center replacement workloads. What are your thoughts on this? Some believe that people will not use NVIDIA's GPUs to rebuild CPU data centers; how do you respond to this viewpoint?

Dylan: NVIDIA has long been promoting the use of accelerators for non-AI workloads, such as in the professional visualization field (like Pixar making movies) and Siemens engineering applications, which have used GPUs.

Although these are just a small part compared to AI, there are indeed applications. Regarding data center replacement, while AI is developing rapidly, traditional workloads (such as network services and databases) will not stop or slow down because of this. The supply chain for data centers is long, and the construction cycle is also lengthy; this is a real issue.For example, Intel's CPUs have made slow progress in recent years, while AMD's emergence has provided higher performance options. Many old Intel CPU servers in Amazon's data centers have been in use for years and can now be replaced with new servers (such as 128-core or 192-core) that offer better performance. This not only enhances performance but also reduces the number of servers under the same power consumption, thereby freeing up space for AI servers.

So, although there are instances of data center replacements, the overall market is still growing; it is just that the development of AI has prompted this behavior, as companies need more computing power to support AI applications.

Host: This reminds me of what Sasha mentioned in the program last week. He said they are limited by data centers and power, not by chips. Do you think this relates to your earlier explanation?

Dylan: I think Sasha's point emphasizes the current bottleneck position of data centers and power, which is different from the chip supply situation. With the construction of data centers and the tightness of power supply, companies need to plan resources more rationally. This also explains why they take measures such as acquiring power resources from cryptocurrency mining companies or extending the depreciation period of old servers.

If there is no data, synthetic data can be created to improve models

Host: Before discussing alternatives to NVIDIA, let's talk about the pre-training and scaling debate you mentioned in your article. Iliya said data is the "fossil fuel" of AI, and we have consumed most of it. The huge gains from pre-training will not be repeated. What do you think of this view?

Dylan: The law of pre-training scaling is relatively simple; increasing computing resources can enhance model performance, but this involves two dimensions: data and parameters.

When data runs out, although we can continue to scale up the model, the returns may diminish. However, our utilization of video data is still very limited, which is a misunderstanding. In fact, text is currently the most effective data domain, but video data contains more information. Moreover, pre-training is only part of model training; inference time calculations are also important. If data runs out, we can continue to improve models by creating synthetic data, such as the methods companies like OpenAI are trying, by having models generate large amounts of data, then validating functionality and filtering out effective data for training, thereby improving model performance. Although this method is still in its early stages and the investment is relatively small, it provides a new direction for model improvement.

Host: From an investment perspective, NVIDIA is receiving a lot of attention. But if the gains from pre-training have mostly been realized, why are people still building larger clusters?

Dylan: Although the one-time huge gains from pre-training may have passed, we can still obtain certain returns by increasing computing resources, especially in a competitive environment where companies want to enhance model performance to maintain competitiveness.

Additionally, the comparison between models and competitors' models also drives companies to continue investing. Although from a return on investment perspective, continuing to scale may be exponentially expensive, it can still be a rational decision, because the returns still exist, just with increased difficulty in obtaining them.Moreover, with the emergence of new methods such as synthetic data generation, the pace of model improvement may accelerate, which also provides motivation for companies to continue investing.

Host: In which areas is synthetic data most effective? Can you give some examples?

Dylan: Synthetic data is most effective in areas where functional validation can be performed, such as in Google's services, where they have a large number of unit tests to ensure the system operates correctly. These unit tests can be used to evaluate whether the outputs generated by LLM are correct.

In fields like mathematics and engineering, outputs can be evaluated against clear standards, while in some subjective areas, such as art, writing style, negotiation skills, etc., it is difficult to perform functional validation because the criteria for judgment in these fields are quite subjective. For example, in the field of image generation, it is hard to say which image is more beautiful, as it depends on personal preference; whereas in mathematical calculations or engineering designs, it is clear to judge whether the output is correct.

Wall Street Underestimates Capital Expenditure of Large Data Centers

Host: What have you heard from hyperscale data centers? They all indicate that capital expenditure (capex) will increase next year and are building larger clusters. Is that true?

Dylan: According to our tracking and analysis, Wall Street's estimates of capex are often too low. We track every data center globally and find that companies like Microsoft, Meta, and Amazon are spending significantly on data center capacity.

They have signed leasing agreements for data centers for next year, expecting cloud revenue to accelerate growth as they are currently constrained by data center capacity. This indicates that they believe they can win in competition by scaling up, which is why they continue to invest.

Host: You previously mentioned the construction of large-scale clusters for pre-training. If the trend in pre-training changes, how will their construction for inference change?

Dylan: When training neural networks, forward propagation is used to generate data, and backward propagation is used to update weights. In the new paradigm of synthetic data generation, output evaluation, and model training, the computational load of forward propagation significantly increases because a large number of possibilities need to be generated, while the computational load of backward propagation is relatively small since training is only done on a few effective data. This means that there is a large amount of inference computation during training, and in fact, the amount of inference computation during training is greater than the computation required to update model weights.

Additionally, whether all components need to be in the same location during model training depends on the specific situation.

For example, Microsoft is building multiple data centers in different regions because they have found that they can distribute inference workloads across different data centers while updating models elsewhere, which allows for more efficient resource utilization. Therefore, the paradigm of pre-training has not slowed down; rather, the cost of improvements in each generation increases logarithmically, but companies are looking for other ways to reduce costs and improve efficiency.

NVIDIA is Not Cisco in 2000

Host: Some people compare NVIDIA to Cisco in 2000. What do you think?

Dylan: There are some unfair aspects to this comparison. A large portion of Cisco's revenue came from private/credit investments in telecommunications infrastructure, while NVIDIA's revenue sources are different, with a smaller proportion coming from private/credit investments, such as CoreWeave, which is supported by Microsoft.**

In addition, during the internet bubble period, the scale of private capital entering the field was much larger than it is now. Although the venture capital market seems active now, in reality, private markets (such as Middle Eastern sovereign wealth funds) have not yet seen a significant influx of funds. Moreover, compared to Cisco at that time, the capital sources, positive cash flow, and rationality of investments of these profitable companies now are quite different. NVIDIA's current price-to-earnings ratio is 30, which is still far from Cisco's 120 at that time, so a simple analogy cannot be made.

Inference time reasoning is a new direction for expanding intelligence

Host: You mentioned that inference time reasoning is a new direction for expanding intelligence and that its computational intensity is higher than pre-training. Can you elaborate on that?

Dylan: Pre-training may encounter diminishing returns or excessive costs, but synthetic data generation and inference time computation have become new development directions.

Inference time computation sounds good because it does not require spending more on training the model, but there are significant trade-offs in practice. Take GPT-4o as an example; it generates a large amount of data during inference, but ultimately only a portion of it is output to the user, consuming a lot of computational resources in the process.

For instance, when processing user requests, the model may generate thousands of intermediate results (tokens), but ultimately only outputs a few hundred to the user. This means that computational costs increase significantly, not only due to the increased number of generated tokens but also because more memory is needed to store contextual information (such as KV cache) when processing these tokens, which reduces the number of user requests the server can handle simultaneously, thereby increasing the cost per user.

From a cost perspective, for a company like Microsoft, if its inference revenue is $10 billion with a gross margin of 50-70%, the costs would be in the billions. When using models like GPT-4o, the cost may rise significantly due to increased inference computation costs. Although the model performs better and can charge higher fees, the increase in costs may exceed the increase in revenue.

The enterprise-level demand for the GPT-4o model is underestimated

Host: So is the market's enterprise-level demand for models like GPT-4o overestimated or underestimated?

Dylan: GPT-4o is still in its early stages, and people's understanding and application of it are not yet deep enough.

However, from some anonymous benchmark tests currently available, many companies (such as Google, Anthropic, etc.) are developing inference models and see a clear path to improving model performance by increasing computational resources. These companies have relatively low investment in inference and are still in the early stages, but they have significant room for improvement and it is expected that in the next 6 months to 1 year, there will be substantial improvements in model performance in certain function-validated benchmark tests. Therefore, the market's demand potential for such models is enormous, but it is currently difficult to assess accuratelyHost: Looking back at the internet wave, many startups initially relied on Oracle and Sun Microsystems' technology, but the situation changed five years later. Will this happen in the AI chip field?

Dylan: Currently, GPT-4o is very expensive, but if the model size is reduced, the cost will drop significantly.

For example, the cost can be greatly reduced from GPT-4o to Llama 7b. For smaller models, inference is relatively easy and can run on a single chip, leading to intense market competition, with many companies offering API inference services based on models like Llama, resulting in fierce price competition and lower profit margins.

In contrast, companies like Microsoft that use OpenAI models have higher gross margins (50-70%) because they possess high-performance models and have enterprises or consumers willing to pay high fees for them.

However, as more companies enter the market, model differentiation becomes increasingly important. Only those with the best models and the ability to find enterprises or consumers willing to pay for them can stand out in the competition. Therefore, the market is rapidly filtering, and ultimately, only a few companies may be able to compete in this field.

Google and Amazon chips each have their strengths and weaknesses

Host: So how is AMD doing among these competing companies?

Dylan: AMD excels in chip engineering but has significant shortcomings in software. They lack sufficient software developers and have not invested in building GPU clusters to develop software, which contrasts sharply with NVIDIA.

Additionally, AMD has been focused on competing with Intel and lacks system-level design experience. Although they acquired ZT Systems, they are still behind NVIDIA in system architecture design for large-scale data centers.

Hyperscale data center customers (such as Meta and Microsoft) are helping AMD improve software and understand model development, inference economics, etc., but AMD still cannot compete with NVIDIA on the same timeline. It is expected that AMD's AI revenue share among customers like Microsoft and Meta will decline next year, but they will still profit from the market, just not achieve the massive success that NVIDIA does.

Host: What about Google's TPU? It seems to be the second choice after NVIDIA.

Dylan: Google's TPU has its uniqueness in system and infrastructure. While the performance of a single TPU is good, its system design is more important. The TPU system built by Google in collaboration with Broadcom is competitive in chip interconnect, network architecture, etc., and even surpasses NVIDIA in some aspects.

Moreover, Google has adopted water cooling technology for many years, improving system reliability, while NVIDIA only recently realized the need for water cooling technology.

However, Google's TPU has relatively limited commercial success, mainly due to its software not being open enough, with many internally used software (such as the software used by DeepMind) not being available to Google Cloud users;In terms of pricing, although the official pricing is relatively high, the actual negotiated prices still lack competitiveness. Compared to other cloud service providers (such as Oracle, Microsoft, Amazon, etc.), Google's TPU pricing does not have an advantage;

In addition, Google uses a large number of TPUs for internal services (such as search, Gemini applications, etc.), with a small share of the external rental market, mainly serving Apple, and Apple's rental of TPUs may be related to its attitude towards NVIDIA (there may be a competitive relationship, but the specific reasons have not been mentioned).

Host: What about Amazon? Can you provide a detailed introduction to Amazon's chips like you did for Google's TPU?

Dylan: Amazon's chips can be referred to as "Amazon Basic TPU." They have cost-effectiveness advantages in some aspects, such as using more silicon and memory, with network capabilities comparable to TPUs, but there are shortcomings in efficiency, such as using more active cables (Google TPUs, in collaboration with Broadcom, use passive cables) and lower silicon area utilization efficiency.

However, Amazon has advantages in HBM memory bandwidth and cost per dollar by reducing costs, and its chip prices are far lower than NVIDIA's. Although they are lower than NVIDIA's in technical specifications (such as memory, bandwidth, etc.), they are attractive for some cost-sensitive application scenarios.

Amazon has partnered with Anthropic to establish a supercomputer system containing 400,000 chips, believing that large-scale chip deployment is useful for inference and model improvement. Although it may not be the most advanced technically, its cost-effectiveness makes it a reasonable choice for Amazon.

Clear capital expenditure for next year, uncertainty after 2026

Host: Looking ahead to 2025-2026, what are your views on the semiconductor market? For example, Broadcom's recent stock price increase and NVIDIA's stock price fluctuations, how do you think the market will develop?

Dylan: Broadcom has achieved some results in the custom ASIC field, such as winning multiple custom ASIC orders, including orders from companies like Google. Google is working to enhance the performance of its custom chips, especially in recommendation systems. Additionally, companies like OpenAI are also developing their own chips, and Apple has some chips produced in collaboration with Broadcom. These trends indicate that market competition will become more intense.

From an overall market perspective, hyperscale data centers plan to significantly increase spending next year, which will drive the development of the entire semiconductor ecosystem (including network equipment suppliers, ASIC suppliers, system suppliers, etc.).

However, there is some uncertainty regarding the situation in 2026.

On one hand, whether model performance can continue to improve will be a key factor. If the pace of model performance improvement slows down, it may lead to market adjustments since the current market growth largely depends on the continuous advancement of model performance and the resulting increase in demand for computing resourcesOn the other hand, capital investment is also an important variable. Currently, sovereign wealth funds from the Middle East, pension funds from Singapore, Nordic countries, and Canada have not yet entered the market on a large scale, but if they decide to invest significant amounts of money in the future, it will have a substantial impact on the market.

In addition, the new cloud market will face consolidation. Among the approximately 80 new cloud service providers we are tracking, only a few (5-10) are likely to survive in the competition. Among them, 5 are sovereign cloud service providers, and about 5 are competitive enterprises.

Currently, the GPU leasing market is experiencing rapid price changes, such as the significant drop in leasing prices for NVIDIA H100. Not only is there fierce competition among new cloud service providers, but the on-demand GPU pricing from large cloud service providers like Amazon is also declining rapidly. The proportion of enterprises purchasing GPU clusters remains relatively low; they prefer to outsource their GPU computing needs to new cloud service providers, but this situation may change with market consolidation.

For NVIDIA, although it faces competition, there is still an opportunity to dominate the market if it can maintain technological leadership and launch products with better performance and lower costs. For example, although the cost of their upcoming products is higher than that of previous generations, there is still potential for growth through performance optimization and price strategy adjustments. However, if market demand does not grow as expected or if more competitive alternatives emerge, NVIDIA's revenue may be affected.

Host: Thank you very much, Dylan, for today's sharing, which has given us a deeper understanding of the semiconductor industry's development in the AI field. We hope to continue to monitor the dynamics in this area in the future and look forward to seeing how various companies perform in this market full of opportunities and challenges. Thank you again!

Dylan: Thank you, I'm glad to share my views here.

Host: Just a reminder, the above content represents our views only and does not constitute investment advice