NVIDIA's market share reaches as high as 40%! The battle of AI chip "inference" is underway.

The main battlefield of AI chips is shifting from the training field to the inference field. Huang Renxun: The proportion of inference business accounts for 40%, and AI has finally achieved success.

The main battlefield of AI chips is shifting towards a larger and more competitive field - inference.

Colette Kress, the CFO of NVIDIA, stated last week that in NVIDIA's largest revenue-generating data center department, over 40% of the business last year was related to deploying AI models, rather than training - a significant sign of the AI chip market shifting.

The demand for chips to train AI models has propelled NVIDIA to become the "new king of chips". With the industry rapidly evolving, the next market focus will be on chips used for real-time inference.

Compared to training chips, inference chips have lower requirements for power consumption and cost, which also means that the competitive market will become more intense. Can NVIDIA's "new king" status continue?

From GPU to LPU

During the training phase, companies often feed massive amounts of data to train large-scale neural networks. Due to requirements such as high computational density, low power consumption, and large memory bandwidth, most AI models currently rely on NVIDIA's GPUs for training.

During the inference phase, AI models utilize neural network models for predictive reasoning based on the training, in response to user commands. These chips (LPUs) do not require as high overall performance as GPUs, but have stronger inference engine performance.

For AI chip manufacturers, inference chips are becoming increasingly important and are capturing more opportunities in the market.

According to reports, analyst Ben Reitzes from Melius Research stated in a client note:

"Some views believe that NVIDIA's market share in the inference field will be lower than in the training field."

"This means that the upcoming 'inference explosion' may bring a wave of profits."

Bank of America analyst Vivek Arya also believes that with the surge in investment in AI model training, the focus will shift towards revenue generation from AI models, making the inference field more competitive compared to NVIDIA's dominant training field.

Based on NVIDIA's "40%" proportion, the development speed of inference technology may be much faster than previously expected. Earlier this year, analysts at UBS predicted that by next year, 90% of chip demand will come from training, with inference chips accounting for only 20% of the market.

NVIDIA's competitors are gearing up.

Some believe that as customers increasingly focus on reducing the operating costs of AI models, Intel's chips will become more attractive. There are reports that the types of chips Intel excels in manufacturing have been widely used in the inference field, with little difference from NVIDIA's more advanced and expensive H100 in practical inference applications.

Intel's CEO Pat Gelsinger mentioned in an interview at the end of last year: From an economic perspective, I wouldn't build a backend environment full of H100s that costs $40,000 because it consumes too much power, requires building new management and security models, and new IT infrastructure. If I could run these models on standard Intel chips, these issues wouldn't arise.

Apart from established chip giants like Intel and AMD, some startups may also rise to prominence. Groq, a company founded by former Google AI chip engineer Jonathan Ross, is one of the challengers. The LPU developed by the company claims to be the "fastest large model in history," disrupting GPT-4's speed record of 40 tok/s with 500 tokens per second.

Huang Renxun: Inference business accounts for 40%, AI has succeeded. However, cost remains a hurdle. Giants like Amazon, Google, and Microsoft have been developing inference chips internally to reduce operational costs.

Ross pointed out:

"For the inference field, how much you can deploy depends on cost."

"At Google, many models can be successfully trained, but 80% of them cannot be deployed because the production cost is too high."

Rodrigo Liang, CEO of chip software startup SambaNova, stated:

"We see rapid growth in our inference application cases."

"People are beginning to realize that over 80% of the cost will be used for inference, so I need to find alternative solutions."

Currently, NVIDIA remains in a leading position in this transition. It is reported that an upcoming NVIDIA chip achieved industry-leading results in a critical AI inference benchmark test last year, continuing the company's dominant position in competition over the years.

Moreover, NVIDIA's latest earnings report shows that the company still holds over 80% market share in the AI chip sector. This indicates that NVIDIA's training chips are expected to continue to have high demand in the foreseeable future.

On February 23, NVIDIA CEO Huang Renxun stated in an interview with Wired that inference currently accounts for 40% of NVIDIA's business, indicating that AI models will soon be put into practice. He said:

"We love inference. If I were to estimate, I think NVIDIA's current business composition is probably 40% inference and 60% training. Why is this a good thing? Because it means that artificial intelligence is finally making it." Today, whenever you input a prompt into the cloud, it generates something - it could be a video, an image, 2D or 3D graphics, text, or charts - behind all this, there is likely an NVIDIA GPU at work.