articles channels tags spaces toolkit

NVIDIA Technical BlogDecember 4, 2023

New NVIDIA NeMo Framework Features and NVIDIA H200 Supercharge LLM Training Performance and Versatility

NVIDIA NeMo Framework Updates

The latest release of the NeMo framework by NVIDIA includes optimizations and new features.
It introduces parallelism techniques to make training models on the NVIDIA AI platform easier.
These improvements increase Tensor Core usage on GPUs based on the NVIDIA Hopper architecture.
Performance and versatility are improved, achieving exceptional training throughput for Llama 2 models.

Parallelism Techniques with FSDP

The new release of NeMo introduces Full Sharded Data Parallelism (FSDP) technique.
FSDP reduces model capacity requirements and improves memory bandwidth by 1.8x.
It enables effective distribution and management of data and memory for large language models (LLMs).
FSDP provides performance competitive with traditional parallelism methods.

Mixture of Experts (MoE)

The latest NeMo release supports MoE-based LLM architectures with expert parallelism.
Expert parallelism can be combined with other parallelism dimensions to distribute experts across data parallel ranks.
Increasing the number of parameters in the model improves its information absorption and generalization capabilities.

Reinforcement Learning from Human Feedback (RLHF)

NeMo's RLHF support has been enhanced with the ability to use TensorRT-LLM for inference.
TensorRT-LLM enables pipeline parallelism for RLHF, resulting in better performance with fewer nodes.
Using TensorRT-LLM in the RLHF loop with H100 GPUs achieves up to a 5.6x performance increase.

NVIDIA NeMo Versatility

NeMo is regularly updated to optimize the performance of generative AI model training.
The NVIDIA platform accelerates the entire AI workflow, from data preparation to model training to deploying inference.
It provides versatility and supports various parallelism techniques for efficient model training and deployment.