NVIDIA Technical Blog

NVIDIA GB200 NVL72 Delivers Trillion-Parameter LLM Training and Real-Time Inference

thumbnail

Table of Contents

  1. Introduction
  2. NVIDIA GB200 NVL72 Overview
  3. NVIDIA GB200 NVL36 and NVL72
  4. Fifth-generation NVLink and NVLink Switch System
  5. Performance Comparison
  6. Real-Time Inference and Training

Introduction

The NVIDIA GB200 NVL72 is a cutting-edge system designed for training large language models with trillion parameters. It features advanced technologies such as liquid cooling, PCIe gen 6 support, and NVLink connectors for high-speed networking.


NVIDIA GB200 NVL72 Overview

The NVIDIA GB200 NVL72 is based on the new NVIDIA MGX design and supports up to 72 GPUs in a single rack. It utilizes cold plates for efficient cooling, PCIe gen 6 for high-speed networking, and NVLink connectors for seamless GPU-to-GPU communication.


NVIDIA GB200 NVL36 and NVL72

The GB200 can support either 36 or 72 GPUs in NVLink domains. The NVL72 configuration consists of 72 GPUs in a single rack with 18 dual GB200 compute nodes, or 72 GPUs in two racks with 18 single GB200 compute nodes.


Fifth-generation NVLink and NVLink Switch System

The GB200 NVL72 introduces fifth-generation NVLink technology, enabling connectivity of up to 576 GPUs in a single NVLink domain. With over 1 PB/s total bandwidth and 240 TB of fast memory, this system offers unprecedented performance for large models.


Performance Comparison

The fifth-generation NVLink in the GB200 NVL72 provides 1.8 TB/s of bidirectional throughput per GPU, significantly outperforming PCIe Gen5. This results in 4X faster training performance for large language models and 30x speedup in certain comparisons.


Real-Time Inference and Training

With the NVIDIA GB200 NVL72, real-time inference for a 1.8T parameter MoE LLM is achievable, and training large models is 4x faster compared to previous GPU architectures. This system enables top-notch performance for a variety of data analytics and data science tasks.