Whisper API For Speech-To-Text Transcription & Translation
Whisper is an artificial speech recognition system that costs $0.006 per minute and, according to OpenAI, enables "robust" transcription in numerous languages and translation into English from those languages.
Among the file types it accepts are MP3, MP4, M4A, MPEG, MPGA, WAV, and WEBM.
Speech recognition systems, which are at the heart of software and services from digital behemoths like Amazon, Google, and Meta, have been developed by countless businesses.
Greg Brockman, president and CEO of OpenAI, claims that Whisper's training was based on 680,000 hours of multilingual and "multitask" data that was gathered from the internet.
It results in a better understanding of unusual accents, background noise, and jargon.
OpenAI debuts Whisper API for speech-to-text transcription and translation
Whisper is not without flaws, though, especially when it comes to "next-word" prediction.
OpenAI warns that Whisper's transcriptions may contain phrases that weren't actually spoken because the system was trained on a lot of noisy data.
It might be because it is simultaneously attempting to transcribe the audio recording and predict the next word in speech.
Whisper also doesn't function equally well across linguistic barriers, exhibiting a higher error rate for speakers of languages underrepresented in the training set.
Despite this, OpenAI believes that Whisper's transcription capabilities will be applied to enhance already-existing software, services, and applications.
The Whisper API is already being used by the AI-powered language learning app Speak to enable a brand-new in-app virtual speaking companion.