Surpassing NVIDIA's H100! Intel launches the new generation AI chip Gaudi 3, enhancing large model training and inference capabilities by 50%

Intel has released the new generation AI chip Gaudi 3, which is said to outperform Nvidia's competitor H100. Gaudi can reduce the training time of Llama2 and GPT-3 models by 50%, and increase the inference throughput of Llama and Falcon models by 50%. Intel will also collaborate with multiple enterprises to build an open platform for enterprise AI

Author: Li Dan

Source: Hard AI

NVIDIA's dominance in artificial intelligence (AI) faces a new challenge as Intel launches a new generation AI chip claimed to outperform NVIDIA's competitors.

On Tuesday, April 9th, Eastern Time, during this year's Intel Vision 2024 customer and partner conference, Intel officially released the third generation Intel AI accelerator, Gaudi 3. Intel stated that it will bring high performance, openness, and new choices for enterprise applications in generative AI. At the same time, Intel also introduced a new open scalable system, next-generation products, and strategic partnerships to accelerate the application of generative AI.

When Intel CEO Pat Gelsinger announced the new AI data center and PC chips last year, he hinted at the launch of Gaudi 3 for deep learning and large generative AI models. Intel claimed at that time that Gaudi 3's performance would surpass NVIDIA's flagship AI chip, the H100.

During the Intel Vision 2024 conference on Tuesday, Gelsinger stated, "Innovation is moving forward at an unprecedented pace, all driven by chips - every company is rapidly becoming an AI company. Intel is bringing AI to every corner of the enterprise, from PCs to data centers to the edge. Our latest Gaudi, Xeon, and Core platforms are providing a tightly integrated flexible solution designed to meet the evolving needs of customers and partners and fully leverage the huge opportunities of the future."

Gaudi can reduce model training time by 50% compared to H100 and increase inference throughput by 50%

Intel introduced that Gaudi 3 is manufactured using 5-nanometer process technology, designed for efficient large-scale AI computing, and will provide tens of thousands of accelerator support through a standard Ethernet connection. For enterprises looking to scale up generative AI, Gaudi 3 can achieve a significant leap in performance and productivity in AI training and inference for large language models (LLM) and multimodal models.

Compared to its predecessor, Gaudi 3 can provide four times the BF16 AI computing power, a 1.5x increase in memory bandwidth, and a two-fold increase in network bandwidth serving large-scale system expansion Comparing to NVIDIA's H100 chip, Gaudi 3 is expected to reduce the training time of Meta Llama2 model with 7B and 13B parameters, as well as OpenAI GPT-3 model with 175B parameters, by an average of 50%.

Furthermore, when applied to Llama model with 7B and 70B parameters, as well as the open-source Falcon model with 180B parameters, Gaudi 3 is expected to have an average 50% higher inference throughput and 40% higher inference efficiency compared to H100. Moreover, Gaudi 3 demonstrates greater inference performance advantages on longer input and output sequences.

When applied to Llama model with 7B and 70B parameters, as well as Falcon model with 180B parameters, Gaudi 3 is projected to have a 30% higher inference speed compared to NVIDIA's H200.

Intel stated that Gaudi 3 will be supplied to customers in the third quarter of this year, and will be provided to OEM manufacturers including Dell, HPE, Lenovo, and Supermicro in the second quarter, but did not disclose the price range of Gaudi 3.

Das Kamhout, Vice President of Intel's Xeon Software, mentioned that Intel expects Gaudi 3 to be highly competitive against NVIDIA's latest products, citing competitive pricing and the use of industry-standard Ethernet in their unique open integrated chip network. He expressed confidence in the strength of the product.

Intel AI Solutions Customers Include IBM, Google Cloud to Utilize Confidential Computing Features

During Intel Vision 2024, Intel introduced its strategic open scalable AI system, encompassing hardware, software, frameworks, and tools.

Intel stated that this approach enables participants in the AI field to build a broad, open ecosystem, providing solutions that meet enterprise-specific generative AI needs. Intel's strategic service customers include device manufacturers, database providers, system integrators, software and service providers, such as NAVER using Gaudi chips, IBM deploying fifth-generation Xeon processors in their database applications, and CtrlS Group collaborating with Indian customers to build AI supercomputers.

Intel also announced collaborations with Google Cloud, Thales, and Cohesity, where these three partners will utilize Intel's confidential computing features in their cloud instances.

Collaborating with Multiple Enterprises to Build an Open Platform for Enterprise AI

During Intel Vision 2024, Intel also announced partnerships with Anyscale, Articul8, DataStax, Domino, Hugging Face, KX Systems, MariaDB, MinIO, Qdrant, Red Hat, Redis, SAP, VMware, Yellowbrick, and Zilliz are collaborating to create an open platform for enterprise AI.

Intel stated that this industry-wide effort aims to develop an open, multi-vendor generative AI system. By leveraging Retrieve-Augment-Generate (RAG), enterprise users can achieve top-notch performance and easy deployment of generative AI. In the RAG-enhanced ecosystem, enterprises can enhance their ability to run a large number of existing proprietary data sources on standard cloud infrastructure through open LLM functionality, thereby accelerating the application of generative AI in enterprises.

As a first step in this effort, Intel will release a reference implementation of generative AI pipelines based on secure Intel Xeon and Gaudi solutions, publish a technical concept framework, and continue to increase infrastructure capacity in the Intel Tiber Developer Cloud to enable the development and validation of the RAG ecosystem and future pipelines.

Launching the 6th generation Intel Xeon processor for enterprise AI

In addition to the Gaudi 3 accelerator, Intel has also introduced another hardware: the 6th generation Intel Xeon processor. It provides high-performance solutions for running current generative AI solutions, including RAG. Targeting all enterprise AI, it will be available in the second quarter of this year.

Intel introduced that compared to the 2nd generation Intel Xeon processor, the 6th generation Xeon processor with the codename Sierra Forest has quadrupled performance per watt and increased rack density by 2.7 times.

The 6th generation Xeon processor with the codename Granite Rapids, which includes software support for the MXFP4 data format, can reduce the next token latency by up to 6.5 times compared to the 4th generation Xeon processor that uses FP16. It can also run the Llama-2 model with 700 billion parameters.