NVIDIA Technical Blog

New NVIDIA NeMo Framework Features and NVIDIA H200 Supercharge LLM Training Performance and Versatility

thumbnail

NVIDIA NeMo Framework Updates

  • The latest release of the NeMo framework by NVIDIA includes optimizations and new features.
  • It introduces parallelism techniques to make training models on the NVIDIA AI platform easier.
  • These improvements increase Tensor Core usage on GPUs based on the NVIDIA Hopper architecture.
  • Performance and versatility are improved, achieving exceptional training throughput for Llama 2 models.

Parallelism Techniques with FSDP

  • The new release of NeMo introduces Full Sharded Data Parallelism (FSDP) technique.
  • FSDP reduces model capacity requirements and improves memory bandwidth by 1.8x.
  • It enables effective distribution and management of data and memory for large language models (LLMs).
  • FSDP provides performance competitive with traditional parallelism methods.

Mixture of Experts (MoE)

  • The latest NeMo release supports MoE-based LLM architectures with expert parallelism.
  • Expert parallelism can be combined with other parallelism dimensions to distribute experts across data parallel ranks.
  • Increasing the number of parameters in the model improves its information absorption and generalization capabilities.

Reinforcement Learning from Human Feedback (RLHF)

  • NeMo's RLHF support has been enhanced with the ability to use TensorRT-LLM for inference.
  • TensorRT-LLM enables pipeline parallelism for RLHF, resulting in better performance with fewer nodes.
  • Using TensorRT-LLM in the RLHF loop with H100 GPUs achieves up to a 5.6x performance increase.

NVIDIA NeMo Versatility

  • NeMo is regularly updated to optimize the performance of generative AI model training.
  • The NVIDIA platform accelerates the entire AI workflow, from data preparation to model training to deploying inference.
  • It provides versatility and supports various parallelism techniques for efficient model training and deployment.