NVIDIA GB200 NVL72 Delivers Trillion-Parameter LLM Training and Real-Time Inference
Table of Contents
- Introduction
- NVIDIA GB200 NVL72 Overview
- NVIDIA GB200 NVL36 and NVL72
- Fifth-generation NVLink and NVLink Switch System
- Performance Comparison
- Real-Time Inference and Training
Introduction
The NVIDIA GB200 NVL72 is a cutting-edge system designed for training large language models with trillion parameters. It features advanced technologies such as liquid cooling, PCIe gen 6 support, and NVLink connectors for high-speed networking.
NVIDIA GB200 NVL72 Overview
The NVIDIA GB200 NVL72 is based on the new NVIDIA MGX design and supports up to 72 GPUs in a single rack. It utilizes cold plates for efficient cooling, PCIe gen 6 for high-speed networking, and NVLink connectors for seamless GPU-to-GPU communication.
NVIDIA GB200 NVL36 and NVL72
The GB200 can support either 36 or 72 GPUs in NVLink domains. The NVL72 configuration consists of 72 GPUs in a single rack with 18 dual GB200 compute nodes, or 72 GPUs in two racks with 18 single GB200 compute nodes.
Fifth-generation NVLink and NVLink Switch System
The GB200 NVL72 introduces fifth-generation NVLink technology, enabling connectivity of up to 576 GPUs in a single NVLink domain. With over 1 PB/s total bandwidth and 240 TB of fast memory, this system offers unprecedented performance for large models.
Performance Comparison
The fifth-generation NVLink in the GB200 NVL72 provides 1.8 TB/s of bidirectional throughput per GPU, significantly outperforming PCIe Gen5. This results in 4X faster training performance for large language models and 30x speedup in certain comparisons.
Real-Time Inference and Training
With the NVIDIA GB200 NVL72, real-time inference for a 1.8T parameter MoE LLM is achievable, and training large models is 4x faster compared to previous GPU architectures. This system enables top-notch performance for a variety of data analytics and data science tasks.