Barclays predicts that with the popularization of AI applications, the demand for inference computing is expected to exceed 70% by 2026. By then, it may require four times more chip capital expenditure than expected to meet all demands
In the era of AI explosion, where is the next breakthrough? Barclays released a research report on the 22nd, providing a clear answer. The report presents a clear "AI roadmap," outlining the future evolution path of AI technology.
The report points out that the adoption of AI will go through three important stages: first is the current "chatbot/assistant era," followed by the gradual unfolding of the "AI agent era" from 2025 to 2026, and finally entering the "digital employee and robot era" in 2027.
Barclays also predicts that with the popularization of AI applications, the demand for inference computing is expected to exceed 70% by 2026. By then, it may require four times more chip capital expenditure than expected to meet all these needs.
Clear Division of Three Stages
Barclays defines the current stage (2023-2024) as the "chatbot/assistant era," characterized by the widespread application of chatbots (such as ChatGPT, Meta AI, etc.) and some early AI assistants (Copilot).
In this stage, although the performance at the model and infrastructure levels continues to improve, there are still limitations at the application level, with most applications being experimental in nature and not yet widely market-adaptive.
"In this first stage, most investors' returns are allocated to hardware infrastructure providers, similar to the early stages of internet and mobile construction."
The report points out that the current major AI products, such as ChatGPT and Meta AI, have monthly active users exceeding 200 million, but this only accounts for around 10% of the global consumer mobile app market.
Next, Barclays expects the "AI agent era" to arrive in 2025-2026, where the core is the widespread application of AI agents that can autonomously complete tasks.
Unlike chatbots and assistants, AI agents can not only complete complex tasks by passing requests multiple times but also reduce direct human intervention. Behind this shift is the surge in demand for AI inference computing.
Barclays predicts that by 2026, the demand for AI inference computing will account for over 70% of the overall computing demand.
"Investment returns may move up to the application layer (or possibly the model and API layer, although risks in commercialization and competition may increase as the leading position of early innovators gradually weakens).
Unlike the chatbot era, large-scale enterprises may achieve very good returns in the agent era, as inference revenue will surge with adoption."
Finally, Barclays believes that from 2027 onwards, AI technology will further enter the "digital employee and robot era."
In enterprise applications, AI agents may evolve into "digital employees" capable of independently completing tasks, while in the consumer market, intelligent robots will gradually integrate into household life, taking on simple and repetitive daily tasks.
Barclays predicts that by then, the popularization of AI technology will reach the scale of internet users, surpassing 4 billion people. The investment returns in this era should be at the application layer, but "it is difficult to predict today."
Surging Demand for Inference Computing, Capital Expenditure Four Times Higher Than Current Consensus
Barclays emphasizes that the demand for inference computing in the coming years will significantly exceed market expectations, driven by the rise of a new generation of AI products and services.
The report points out that in the three stages of AI development, each stage imposes higher requirements on inference computing.
In the next few years, the demand for computing will surpass the supply. Barclays estimates that by 2025, the GPU and ASIC chips required for training and inference will be 250% higher than the current widespread predictions, and by 2027, it will be 14 times higher. The core driving factors behind this change are the widespread application of consumer-grade and enterprise-grade AI assistants, as well as the proliferation of higher-performance, multimodal AI.
It is worth noting that Barclays predicts that chip capital expenditure in 2026 will need to be four times higher than the current consensus.
To meet this demand, the report points out that NVIDIA currently holds an 80% share in the inference computing market, but by 2028, this share may decrease to 50%, partly due to large cloud service companies launching their own custom ASIC chips to enhance their market share in inference computing.
Bright Profit Prospects for AI Products
In addition to technical and market analysis, Barclays also conducted a detailed evaluation of the cost-effectiveness of AI products.
The report notes that the unit cost of inference computing is rapidly decreasing. Taking OpenAI as an example, Barclays estimates that the company has reduced its inference costs by over 90% in 18 months. In the future, the unit economic benefits of AI products and services will significantly increase, especially those relying on open-source large models.
Although AI companies are generally considered to be loss-making, OpenAI is actually profitable on a single model basis. Barclays estimates that OpenAI's GPT-4 model has generated nearly $2 billion in profit over the past two years through high-level subscriptions and API fees for ChatGPT, despite development costs of only $100-200 million. In the future, OpenAI's revenue is expected to continue growing, which may bring a turning point to the development of AI.
Looking ahead, Barclays believes that the AI industry is at a crucial turning point. The introduction of AI agents will not only significantly increase the demand for inference computing, but also bring new growth opportunities to the enterprise and consumer markets. By 2026, the number of daily active users of consumer AI is expected to exceed 1 billion, while the penetration rate of enterprise agents is projected to reach 5%.
Barclays points out that another significant feature of this development trend is that future AI products will mainly run in the cloud, with only a small amount of processing done on local devices (such as mobile phones and PCs). In particular, when AI agents process user queries, they often need to pass requests multiple times, which will further drive the increase in demand for cloud-based inference computing