Nvidia launches fully open source transcription AI model Parakeet-TDT-0.6B-V2 on Hugging Face
1 min read
Summary
Nvidia has released Parakeet-TDT-0.6B-v2, an automatic speech recognition model that is openly available to download.
It can transcribe audio in just one second, with a word error rate of only 6.05%.
This error rate is much lower than that of other facially biased transcription models such as Open AI’s GPT-4o-transcribe (2.46%) and ElevenLabs Scribe (3.3%).
The model supports a range of applications including transcription services, voice assistants, subtitle generators and conversational AI platforms and can be used commercially under a Creative Commons CC-BY-4.0 license.
It can be accessed via Nvidia’s NeMo toolkit or Hugging Face.
It was trained on the Granary dataset, which includes around 120,000 hours of English audio and plans to make it publicly available after presenting it at Interspeech 2025.