Voicebox AI: The Ultimate Text-to-Speech Solution

Meta has unveiled Voicebox, its generative text-to-speech model that promises to do for the spoken word what ChatGPT and Dall-E, respectfully, did for text and image generation. The system was trained on more than 50,000 hours of unfiltered audio and can generate more conversational sounding speech, regardless of the languages spoken by each party. Voicebox is reportedly capable of actively editing audio clips, eliminating noise from the speech and even replacing misspoken words. Meta's AI reportedly outperformed the current state of the art both in intelligibility and "audio similarity" while operating as much as 20 times faster than today's best TTS systems. However, neither the Voicebox app nor its source code is being released to the public at this time.
Reading Insights
0
0
2 min
vs 3 min read
78%
550 → 120 words
Want the full story? Read the original article
Read on Engadget