
Unlocking the next big opportunity in storage! Korean media details Jensen Huang's "mysterious reasoning contextual memory platform"

NVIDIA launched the "Inference Context Memory Platform" (ICMS) at CES, shifting the focus of AI hardware towards efficient storage. It manages large-capacity SSDs through DPU, addressing the surging demand for "KV cache" in AI inference, making NAND flash/SSD a core computing component. This directly activates the enterprise storage market, bringing huge new orders to leading manufacturers such as Samsung and SK Hynix, opening a new cycle in the industry
At the 2026 International Consumer Electronics Show (CES) on January 5, NVIDIA CEO Jensen Huang unveiled a new hardware called the "Inference Context Memory Platform" (ICMS), aimed at addressing the explosive data storage demands during the artificial intelligence inference phase. This move signifies a shift in the focus of AI hardware architecture from merely stacking computing power to efficient context storage. NAND flash memory and SSDs are expected to take over from HBM as the next key growth engine.
An article in the Korea Economic Daily on January 24 introduced that Huang showcased a mysterious black rack referred to as the "Inference Context Memory Platform" (ICMS) during his speech. This is not just an ordinary hardware update, but a key innovation aimed at solving the data bottleneck in the AI inference phase. Reporters keenly noted that this could be the next explosion point in the storage industry following HBM (High Bandwidth Memory).
The core logic of this platform lies in addressing the "KV cache" (key-value cache) issue in AI inference. As AI transitions from a pure learning phase to large-scale inference applications, the volume of data has exploded, and the existing GPU memory and server memory architectures are struggling to meet the demand. NVIDIA aims to break this physical limitation by introducing a new Data Processing Unit (DPU) and massive SSDs, creating a large cache pool.
This technological transformation is undoubtedly great news for South Korean storage giants Samsung Electronics and SK Hynix. Reports suggest that with the promotion of ICMS, NAND flash memory will enter a "golden age" similar to HBM. This not only indicates a surge in storage capacity demand but also signals a fundamental change in storage architecture—GPUs may bypass CPUs to communicate directly with storage devices at high speeds.
Explosive Growth of KV Cache Triggers Storage Anxiety
The Korean media article pointed out that the core motivation for Huang introducing ICMS technology is the surge in "KV cache." In the AI inference era, KV cache is key for AI to understand dialogue context and perform logical reasoning. For example, when a user asks AI a complex subjective question about G-Dragon, the AI needs to call upon internal model data and historical dialogue context (i.e., KV cache) for weight allocation and reasoning to avoid redundant calculations and hallucinations.
As AI shifts from pure learning to inference and application scenarios expand to multimodal, the amount of data that needs to be processed is showing irregular and explosive growth. NVIDIA has found that relying solely on expensive HBM or conventional DRAM is insufficient to accommodate massive KV caches, and the existing internal storage architecture of servers is inadequate for the upcoming inference era. Therefore, a dedicated storage platform capable of handling vast amounts of data while maintaining efficient access has become a necessity.
DPU-Driven 9600TB Massive Space
According to the Korean media article, the core of the ICMS platform lies in combining DPU with ultra-large capacity SSDs. The article relayed NVIDIA's introduction that the platform uses the new "BlueField-4" DPU, acting as the "administrative logistics officer" for data transmission to alleviate the burden on CPUs A standard ICMS rack contains 16 SSD trays, each equipped with 4 DPUs managing 600TB of SSD, resulting in a total capacity of an astonishing 9600TB for a single rack.
This capacity far exceeds that of traditional GPU racks. In contrast, a VeraRubin GPU platform consisting of 8 racks has a total SSD capacity of approximately 4423.68TB. Jensen Huang stated that through the ICMS platform, the available memory capacity of GPUs has been virtually increased from the previous 1TB to 16TB. Additionally, with the performance enhancement of BlueField-4, the platform achieves a KV cache transfer speed of 200GB per second, effectively addressing the bottleneck issues of large-capacity SSDs in network transmission.
Opening the Golden Age of NAND Flash Memory
The article points out that the ICMS platform primarily utilizes SSDs, which directly benefits NAND flash manufacturers. In the past few years, although AI has been booming, the spotlight has mainly been on HBM, while NAND flash and SSDs have not received the same level of attention.
NVIDIA positions this platform as a "3.5-layer" storage that sits between local SSDs within servers and external storage. Compared to expensive and power-hungry DRAM, SSDs managed by high-performance DPUs offer advantages of large capacity, fast speed, and data retention during power loss, making them an ideal choice for storing KV caches.
This architectural transformation directly benefits Samsung Electronics and SK Hynix. Due to the extremely high storage density requirements of ICMS, the market demand for enterprise-grade SSDs and NAND flash is expected to surge significantly. Furthermore, NVIDIA is advancing the "Storage Next" (SCADA) initiative, aiming to allow GPUs to bypass CPUs and directly access NAND flash, further eliminating data transfer bottlenecks.
SK Hynix has quickly responded to this trend. According to reports, SK Hynix Vice President Kim Tae-seong revealed that the company is collaborating with NVIDIA to develop a prototype product named "AI-N P," planning to utilize PCIe Gen 6 interfaces to launch a storage product supporting 25 million IOPS (input/output operations per second) by the end of this year, with performance expected to reach 100 million IOPS by the end of 2027. As major manufacturers accelerate their layouts, NAND flash and SSDs are expected to usher in a new cycle of rising volume and price in the AI inference era.
Below is the full text of the article from Korean media, translated by AI:
NVIDIA CEO Jensen Huang unveiled a mysterious memory platform called the "Inference Context Memory Platform" at the 2026 International Consumer Electronics Show (CES). Today, the "Technology and City" column will delve into what it actually is.
Keywords: KV Cache
At the NVIDIA Live conference held on the 5th (local time) in Las Vegas, NVIDIA CEO Jensen Huang mentioned the memory platform at the end of his speech. I couldn't help but perk up my ears. Could this be the next HBM?
Today's Star: Black Rack-Mounted NVIDIA ICMS (Inference Context Memory Storage). Image Source: NVIDIA
CEO Jensen Huang points to a black rack in a corner of the Vera Rubin artificial intelligence computing platform. This rack, which is the protagonist of our story today, contains vast storage space.
First, let me explain why Jensen Huang introduced this technology. We should start with "KV cache," which CEO Huang often mentions in official settings. Readers, you may have heard the term KV cache multiple times in recent articles about GPUs and AI hardware.
This keyword is crucial in the era of AI inference. It relates to AI's ability to understand conversational context and perform efficient computations. Let's take a simple example. Suppose you open OpenAI's ChatGPT or Google Gemini and ask a question about the South Korean pop singer G-Dragon.
If the user asks for objective information about G-Dragon's music, fashion, or career, the AI can respond based on the information it has learned. However, after chatting for a while, the user suddenly asks, "So why did he become an 'icon' of his era?" This is akin to asking a question that doesn't have a clear answer. At this point, the AI begins to reason.
This is where KV cache is key: keys and values. First, the key. It's easy for us to understand, but AI uses key vectors to clearly identify who "that person" is in the context of the conversation and the subject and goal of the answer (the key). Then, it utilizes intermediate computational results about G-Dragon and various data collected during the conversation with the user (the value) to allocate weights, reason, and ultimately arrive at an answer.
Without KV cache, if each question were recalculated as if it were the first time, the GPU would repeat work two to three times, reducing efficiency. This could lead to the AI experiencing hallucinations and providing incorrect answers. However, KV cache can improve efficiency. Inference based on "attention computation" reuses various data obtained from long conversations with users and applies weights, resulting in faster speeds and more natural dialogues.
Image provided by NVIDIA
As the AI industry transitions from learning to inference, this key-value cache is no longer just an auxiliary memory. Additionally, the required capacity is continuously increasing.
First, as more people integrate generative artificial intelligence into their daily lives, the irregular surge in data volume is inevitable. With the addition of image and video services, the demand for advanced reasoning and imagination in AI will further grow, and the data volume will explode.
As AI's ability to discover new information continues to improve, it will create a large amount of useful key-value cache (KV cache) in various scenarios during interactions with users
In the face of explosive growth in key-value caching, NVIDIA has also managed GPU traffic. They categorize GPUs into two types: one type generates a large amount of key-value cache, while the other type uses the key-value cache. However, the storage space is insufficient to store all these caches.
Of course, the internal memory capacity of the server is large. Next to the GPU is HBM memory → if that is not enough, DRAM modules are used → if that still doesn't work, SSDs may even be used inside the server. However, CEO Jensen Huang seems to have realized that this architecture will be difficult to manage in the future era of inference. Therefore, he launched this black box at CES.
NVIDIA CEO Jensen Huang launched ICMS at CES 2026. Image provided by NVIDIA YouTube.
DPU + Ultra-large Capacity SSD = Dedicated Team for KV Cache Storage
This black server is the "Inference Context Memory Platform," abbreviated as ICMS. Let's take a closer look at its specifications.
First, the device driving ICMS is the DPU, or Data Processing Unit. Readers may be more familiar with GPUs and CPUs, but the hidden power source of the server—the DPU—is also worth a look.
NVIDIA CEO Jensen Huang unveiled the BlueField-4 DPU. Image provided by NVIDIA.
The DPU (Data Processing Unit) is like an administrative logistics officer in the army. If the CPU is the company commander, then the GPU is the computational assault soldier. The DPU is responsible for the delivery of ammunition and food, and even handles communication and movement, allowing the CPU to make appropriate decisions while the GPU focuses on the attack. NVIDIA's new DPU "Bluefield-4" has been assigned a new task: ICMS (Integrated Content Management System). Now, let's take a closer look at the ICMS platform. This rack contains a total of 16 SSD trays.
Image source: NVIDIA
Each tray is equipped with four DPUs, with each DPU managing 150TB of SSD. This means that a single tray has a total of 600TB of cache SSD.
This is quite a large storage capacity. Let's compare it. Suppose in the Blackwell GPU server, to maximize KV cache, we installed eight 3.84TB general-purpose cache SSDs in the SSD placement area This way, each server has 30.72TB of SSD, which means the total SSD capacity of a GPU rack containing 18 servers is 552.96TB.
In other words, the cache SSD capacity of a single ICMS rack can exceed the capacity that a GPU "rack" can accommodate. The total number of SSDs in a rack is 600TB x 16, which equals 9600TB. This is more than double the SSD capacity of a complete set of VeraRubin with 8 GPU racks (4423.68TB, 552.96 x 8).
Image provided by NVIDIA
Jensen Huang stated in his CES speech: "Previously, the memory capacity of GPUs was 1TB, but with this platform, we have achieved 16TB of storage capacity."
Upon careful consideration, his statement seems quite accurate. A complete VeraRubin platform consists of eight GPU racks. Each rack has 72 GPUs, totaling 576 storage cards. Dividing the total capacity of ICMS, which is 9600TB, by 576 storage cards results in approximately 16.7TB.
Although there are still concerns about the physical distance of servers and the transmission speed of SSDs, the performance improvement of BlueField 4 alleviates these issues. Jensen Huang explained: "We achieved the same KV cache transfer speed of 200GB per second as before."
Additionally, existing GPU servers face network bottlenecks that limit the full utilization of large-capacity SSDs such as 7.68TB and 15.36TB. This DPU-based network improvement seems to be aimed at addressing these issues.
Is the golden age of NAND flash memory, considered "zero," about to arrive?
Image provided by NVIDIA
NVIDIA categorizes this platform into 3.5 memory groups. The first group is HBM, the second group is DRAM modules, the third group is local SSDs within the server, and the fourth group is external storage. ICMS delves into the mysterious realm between the third and fourth groups. Unlike expensive or power-hungry DRAM, SSDs are faster and have larger capacities than hard drives, and they do not lose data even during power outages (thanks to high-performance DPU), making them an ideal choice.
This platform clearly presents enormous opportunities for Samsung Electronics and SK Hynix. Just one rack can increase capacity by 9,600 TB, meaning they can sell NAND flash memory many times more than existing NVIDIA racks, and this is just based on bit calculations In addition, the developer of this product is NVIDIA, a company that all global artificial intelligence companies dream of, thus the business opportunity is even greater.
Samsung Electronics' server solid-state drives. Even though the era of artificial intelligence has arrived, the prices of NAND flash memory and solid-state drives (SSD) have lagged behind, and a significant increase is expected in the first quarter of this year. Image provided by Samsung Electronics.
Over the past three years, despite the rapid development of the artificial intelligence market, NAND flash memory and solid-state drives (SSD) have not received much attention. This is mainly because their utilization rate is lower compared to HBM, which plays a key role in the development of NAND flash memory. NVIDIA is starting with the ICMS project to prepare a project aimed at further enhancing SSD utilization. This project is part of the "Storage Next" (also known as SCADA, or Scaled Accelerated Data Access) plan. Currently, GPUs executing AI computations will directly access NAND flash memory (SSD) to obtain various data without going through control units like CPUs. This is a bold idea aimed at eliminating the bottleneck between GPUs and SSDs. SK Hynix has also officially announced that it is developing AI-N P to align with this trend. SK Hynix Vice President Kim Tae-seong stated, "SK Hynix is actively conducting a preliminary experiment (PoC) called 'AI-N P' with NVIDIA."
He explained, "A storage prototype product based on PCIe Gen 6, supporting 25 million IOPS (Input/Output Operations Per Second), is expected to be released by the end of this year." He also stated, "By the end of 2027, we will be able to produce products supporting up to 100 million IOPS." 25 million IOPS is more than 10 times the current speed of solid-state drives








