OpenAI Introduces Whisperlarge-v3-turbo with Faster Speed and Other Features

By:Maxwell Published 2024-10-03T03:48:45Z

TapTechNews October 3rd news, OpenAI announced the launch of the Whisperlarge-v3-turbo speech transcription model at the DevDay event on October 1st, with a total of 809 million parameters. In the case of almost no decrease in quality, its speed is 8 times faster than that of large-v3.

The Whisperlarge-v3-turbo speech transcription model is an optimized version of large-v3 and has only 4 decoder layers (DecoderLayers). In contrast, large-v3 has a total of 32 layers.

The Whisperlarge-v3-turbo speech transcription model has a total of 809 million parameters, slightly larger than the medium model with 769 million parameters, but much smaller than the large model with 1.55 billion parameters.

OpenAI stated that the speed of Whisperlarge-v3-turbo is 8 times faster than that of the large model, and the required VRAM is 6GB, while the large model requires 10GB.

OpenAI Introduces Whisperlarge-v3-turbo with Faster Speed and Other Features_0

OpenAI Introduces Whisperlarge-v3-turbo with Faster Speed and Other Features_1

OpenAI Introduces Whisperlarge-v3-turbo with Faster Speed and Other Features_2

OpenAI Introduces Whisperlarge-v3-turbo with Faster Speed and Other Features_3

The size of the Whisperlarge-v3-turbo speech transcription model is 1.6GB. OpenAI continues to provide Whisper (including code and model weights) according to the MIT license.

TapTechNews quoted the test result of Awni Hannun. On the M2Ultra, it transcribed 12 minutes of content into 14 seconds.

OpenAI Introduces Whisperlarge-v3-turbo with Faster Speed and Other Features_4

GitHub: https://github.com/openai/whisper/discussions/2363

Model download: https://huggingface.co/openai/whisper-large-v3-turbo

Online experience: https://huggingface.co/spaces/hf-audio/whisper-large-v3-turbo

OpenAI Whisperlarge v3 turbo speech transcription parameters