NVIDIA Technical Blog

NVIDIA Speech and Translation AI Models Set Records for Speed and Accuracy

thumbnail

Table of Contents

  1. NVIDIA Parakeet Speech Recognition Models
  2. Parakeet-TDT Model for Turbocharging Speech Recognition
  3. Canary Multilingual Model for Speech Recognition and Translation
  4. P-Flow for Custom Voice Creation

NVIDIA Parakeet Speech Recognition Models

The NVIDIA Parakeet family of models, including Parakeet CTC 1.1B, Parakeet RNNT 1.1B, and others, set records for both speed and accuracy in automatic speech recognition. These models offer fast inference speeds and high transcription accuracy rates, making them top performers on the Hugging Face Open ASR Leaderboard.

Parakeet-TDT Model for Turbocharging Speech Recognition

The Parakeet-TDT 1.1B model stands out for achieving exceptional accuracy in transcribing spoken English while also being 64% faster than other Parakeet models in the evaluations on the Hugging Face leaderboard. This model showcases advancements in speech recognition technology that improve efficiency and accuracy simultaneously.

Canary Multilingual Model for Speech Recognition and Translation

The Canary 1B model by NVIDIA is a multilingual multitasking model that excels in accuracy across various benchmarks. Offering bidirectional translation between English and German, French, and Spanish, Canary leads the pack on the Hugging Face Open ASR Leaderboard with an average Word Error Rate of 6.67%. Its unique architecture and features make it a versatile and high-performing model for speech recognition and translation tasks.

P-Flow for Custom Voice Creation

NVIDIA's P-Flow model won the LIMMITS '24 challenge by demonstrating its capability to create personalized high-quality voices for speakers. With a short speech prompt, P-Flow can synthesize voices in multiple languages, replicating vocal qualities accurately. This model leverages advanced techniques for speaker voice adaptation and generative speech synthesis, setting new standards in customized voice creation.