Google Plans Dedicated Inference Chips: A Decade in the Making, TPU Challenges NVIDIA's Dominance Across the Board

Google is set to unveil its next-generation TPU (inference chip) at the Google Cloud Next conference and has already secured major orders from giants like Meta and Anthropic. Analysts note that while NVIDIA's position in training is unquestionable, Google may hold an advantage in inference

Google is pushing its self-developed chip business toward a new frontier of competition. It plans to launch custom chips designed specifically for AI inference tasks, further challenging NVIDIA's market dominance after reaching large-scale cooperation agreements with both Meta and Anthropic.

According to Bloomberg, Google plans to unveil the next generation of Tensor Processing Units (TPUs) at the Google Cloud Next conference being held this week in Las Vegas.

Jeff Dean, Google's Chief Scientist, stated in an interview that "given the rising demand for rapid processing of AI queries, it is now reasonable to design chips more specialized for training or inference workloads."

This move comes as the landscape of the AI chip market accelerates its evolution. NVIDIA GPUs remain the industry benchmark in the AI field, especially in model training, but the battle for the inference market is intensifying.

Chirag Dekate, an analyst at market research firm Gartner, pointed out, "The battlefield is shifting towards inference, and on this front, Google possesses infrastructure advantages."

From Internal Tool to Industry Hit: The TPU's Path to Breakout

Google's chip manufacturing journey spans over a decade of accumulation. This journey began with a practical problem: Google needed computing power to support language translation and speech recognition services, but no available chips or hardware on the market could achieve this goal at an affordable cost.

According to Vahdat, the core philosophy behind the TPU was "to solve a small number of problems, but those problems require massive computation." At the time, the mainstream view was that it was not worth developing custom hardware for this purpose, yet Google chose to go against the tide.

During this phase, Google's chip R&D maintained a close co-evolutionary relationship with its AI model work. The landmark research paper published in 2017 that catalyzed contemporary large language models prompted the TPU team to shift its focus toward chip designs serving the training of larger-scale AI systems.

Subsequently, Google DeepMind and the chip team noticed significant idle waste when TPUs executed reinforcement learning tasks. They adjusted the network interconnection between chips accordingly to accelerate data flow and prevent computational idleness.

This internal feedback mechanism also brought stronger control over "hardware-level errors."

Paul Barham, a Google scientist and joint head of the Gemini Infrastructure team, revealed that when AI acceleration chips process massive mathematical operations, even a minor fault can spread and cause a model to "completely collapse." "Now we can screen hundreds of thousands of acceleration chips within 10 seconds," he said.

Major Clients Enter Successively, Commercial Momentum Continues to Accumulate

The commercial breakthrough of Google's chip business arrived swiftly as well. Last October, Anthropic announced an expansion of its cooperation agreement with Google to acquire up to one million TPUs; subsequently, Google's newly released Gemini model received widespread praise, having been trained and run on TPUs.

Demand continued to grow thereafter. Meta signed a multi-year TPU cloud service agreement worth billions of dollars. Santosh Janardhan, Head of Infrastructure at Meta, stated, "It appears there may be advantages in inference," while also noting that "new platforms inevitably come with thresholds and learning curves."

Hedge fund firm Citadel Securities plans to share its experience of achieving faster model training speeds using TPUs compared to previous GPU solutions at this year's Google conference.

Talal Al Kaissi, Interim CEO of Core42, the cloud business arm of Abu Dhabi Technology Group G42, indicated that they have had "multiple rounds of consultations" with Google regarding TPU usage and remain optimistic.

Strengthening the software ecosystem is also advancing simultaneously. Google has allowed TPU customers to use external tools such as PyTorch and third-party scheduling software, no longer requiring complete reliance on Google's proprietary products; meanwhile, Google is also testing allowing partners like Anthropic to deploy some TPUs in their own data centers rather than Google facilities.

NVIDIA's Strong Counterattack and Rebalancing of Market Dynamics

Facing Google's advance, NVIDIA is not without response. Last month, NVIDIA launched an inference chip developed based on technology acquired from Groq. Jensen Huang emphasized the versatility of his chip, stating it can handle "a wide range of applications that many TPUs cannot manage."

In actual deployment, Google itself relies on both TPU and GPU chips. Demis Hassabis, CEO of Google DeepMind, pointed out that top AI labs show particular interest in TPUs, "Many people hope to run on both platforms simultaneously."

Google's advantage lies in its over a decade of chip design experience, ample capital, and first-hand insights into AI models. Among top AI developers, Google is the only company to mass-produce self-developed chips, enabling efficient two-way feedback between hardware and model teams.

Natalie Serrino, co-founder of Gimlet Labs, stated that existing TPUs are very suitable for handling workloads of emerging AI agents, "For these exploding types of tasks, they are excellent tools."

Three-Year Development Cycle vs. Deep Contradiction with Rapid AI Iteration

A constraint for Google's chips is that chip development from R&D to mass production takes about three years, whereas AI models evolve far faster, making it extremely difficult to accurately predict future customer needs.

Barham expressed another concern regarding the feedback loop between hardware and model teams being too tight—such a loop might lead teams to only optimize the fit between current hardware and software, missing more breakthrough new ideas.

To seek a balance between the two, the TPU team sometimes chooses to design chips as "good enough" to handle multiple use cases rather than optimizing them to the extreme for a single purpose; another strategy is to parallelly advance two different design schemes, deciding on the final implementation based on specific needs.

Vahdat's words perhaps best summarize Google's long-term considerations in chip strategy:

"Producing TPUs solely for Google has benefits, but also significant drawbacks. Ultimately, you get trapped in what we call a 'technology island.' It might be a beautiful island, but with limited residents and diversity, it may eventually hinder development."