AMD has launched the Instinct MI350 series of HPC/AI GPUs featuring a new 3nm process, 185 billion transistors, up to 288 GB HBM3e memory, and support for FP4 and FP6 data types, offering significant performance improvements for AI workloads and competitive metrics against NVIDIA's offerings.
The 65th TOP500 list highlights the continued dominance of the El Capitan supercomputer at Lawrence Livermore National Laboratory as the top system, with the US leading in Exascale systems alongside Frontier and Aurora. The list also features new advancements like the JUPITER Booster in Germany and updates on energy efficiency and benchmark performance, reflecting ongoing progress in high-performance computing.
NVIDIA has announced its new Blackwell GB200 NVL4 solution, featuring four GPUs and two Grace CPUs, designed for high-performance computing (HPC) and AI workloads. Alongside, the Hopper H200 NVL is now generally available, offering enhanced performance with 1.5x more HBM memory and 1.7x better LLM inference performance. The GB200 NVL4 module doubles CPU and GPU capabilities, providing significant improvements in simulation and training performance. NVIDIA continues to push AI advancements with plans for future infrastructure developments.
Nvidia has unveiled its latest high-performance computing (HPC) and AI chip, the GB200 NVL4, which integrates four Blackwell GPUs and two Grace CPUs on a single board, consuming 5.4 kilowatts of power. This configuration, showcased at the Supercomputing event in Atlanta, allows for significant compute power without relying on Nvidia's proprietary interconnects, making it compatible with existing HPC systems from companies like HPE and Eviden. The GB200 NVL4 can deliver up to 10 petaFLOPS of FP64 compute per cabinet, although AMD-based systems still offer higher floating-point performance. Nvidia also announced the H200 NVL, a PCIe-based configuration that supports up to 13.3 petaFLOPS of FP8 performance with sparsity, emphasizing flexibility and compatibility with standard server racks.
AMD has surpassed Nvidia in the November Top500 supercomputer rankings, largely due to the new "El Capitan" system, which features AMD's hybrid CPU-GPU compute engines. El Capitan, built by Hewlett Packard Enterprise, has achieved a peak theoretical performance of 2,746.4 petaflops, making it the most powerful supercomputer on the list. AMD's GPUs now account for 72.1% of the new performance added to the rankings, marking a significant shift in the high-performance computing landscape.
TSMC's Q1 2024 revenue reached $18.87 billion, up 12.9% year-over-year but down 3.8% quarter-over-quarter, with HPC processors driving the rebound while N3 (3nm) revenue share steeply declined. N3 wafer sales accounted for 9% of revenue, down from 15% in Q4 2023, while N5 (5nm) and N7 (7nm) revenue shares increased. TSMC attributes the N3 decline to seasonally lower smartphone demand, but expects HPC platform to continue increasing its revenue share, driven by AI processors.
China has quietly launched the Tianhe-3 supercomputer, believed to be the most powerful machine currently in existence, with a peak performance of 2.05 exaflops. The machine's architecture, including its processor, has sparked speculation, with insights suggesting it uses a hybrid device with CPU and accelerator compute as well as three different kinds of memory. The supercomputer is expected to support various application scenarios, including high-performance computing, AI large model training, and big data analysis.
NVIDIA's GH200 Grace Hopper Superchip, featuring a 72-core ARM CPU, has been tested and performs competitively against AMD EPYC and Intel Xeon counterparts. The chip, designed for high-performance computing and cloud applications, offers up to 144 Arm v9 CPU cores and boasts impressive specifications, including LPDDR5x with ECC Memory and a coherent interface 7X faster than PCIe Gen 5. In benchmarks, the Grace CPU demonstrated performance close to top Intel and AMD CPUs, with potential for further optimization. While power and efficiency testing is pending, the chip's lower estimated power consumption compared to leading CPUs is promising, positioning NVIDIA's entry into the Arm CPU segment as a strong start.
Nvidia's "Grace" CG100 server processor, designed for HPC simulation and modeling workloads, holds its own against X86 for HPC, with high core count, low thermal footprint, and LPDDR5 memory. The Grace-Grace superchip, with 144 Arm Neoverse "Demeter" V2 cores and 1 TB of physical memory, shows promising performance in benchmark tests conducted by major supercomputing labs. Early results from the Barcelona Supercomputing Center and the State University of New York campuses in Stony Brook and Buffalo demonstrate the Grace CPU's ability to handle HPC workloads effectively, making it a competitive option for HPC applications.
TSMC, the leading chip manufacturer, has acknowledged that the shortage of compute GPUs used for AI and HPC servers is due to a bottleneck in its chip-on-wafer-on-substrate (CoWoS) packaging capacity. The company is expanding its CoWoS capacity, but expects the shortage to persist for 1.5 years. TSMC produces the majority of processors for AI services, including compute GPUs, FPGAs, and specialized ASICs. The shortage is impacting the availability of high-bandwidth memory (HBM) used in these devices. Traditional outsourced semiconductor assembly and test (OSAT) companies are less motivated to offer advanced packaging services due to the higher financial risks involved. TSMC is investing billions in advanced packaging facilities to increase capacity.
The Texas Advanced Computing Center (TACC) is developing Stampede3, the successor to its capacity-class supercomputer Stampede2. Stampede3 will be a hybrid machine, featuring various Intel CPUs across different generations, as well as experimental nodes equipped with Ponte Vecchio Max Series GPU accelerators. The new system aims to improve performance and efficiency, with the addition of Sapphire Rapids processors with HBM2e memory offering up to a 2X performance improvement compared to regular Sapphire Rapids CPUs. The Stampede3 machine is expected to have nearly 4 petaflops of peak FP64 performance and will utilize Omni-Path networking and Vast Data's all-flash storage system.
Microsoft has announced Azure Quantum Elements, a new system that aims to accelerate chemical and materials science through the scale of Azure High Performance Computing (HPC) and the speed of AI. The system includes tools that will help scientists prepare for a future where a scaled quantum computer could accurately model the most complex molecules. Azure Quantum Elements delivers speed through proprietary software tailored to the needs of chemical and materials scientists and built on Microsoft’s investments in AI, HPC and future quantum technologies.
NVIDIA has confirmed that its Hopper-Next GPU will launch in 2024, succeeding the existing Hopper GPU. The new GPU is expected to be labeled as "Blackwell" and will offer significant generational improvements over its predecessor, with more emphasis on specialized AI engines within the "Transformer" GPU engine. The Blackwell GPUs are likely to retain a monolithic design and use the new 3nm process node. NVIDIA is committed to launching a major GPGPU architecture every 2 years, and we can expect more information on the Hopper-Next GPU by GTC 2024.
Intel is using the ISC High Performance supercomputing conference to lay out a fresh roadmap for HPC customers. The company is explaining some of the hardware development decisions it has made this year, including the pivot on Falcon Shores, transforming it from XPU into a pure GPU design, as well as more high-level details of what will eventually become Intel’s next HPC-class GPU. Intel is also offering an update on Aurora, its Sapphire Rapids with HBM + Ponte Vecchio based supercomputer for Argonne National Laboratory.
The intense demand for GPU compute, driven by an explosion in AI training for generative AI applications based on large language models, and AMD’s desire to have more of a play in AI training with its GPUs, we think the demand will outstrip the Nvidia supply, which means despite the massive advantage that the Nvidia AI software stack has over AMD that the latter’s GPUs are going to get some AI supply wins.