Voicebox AI: The Ultimate Text-to-Speech Solution

1 min read
Source: Engadget
Voicebox AI: The Ultimate Text-to-Speech Solution
Photo: Engadget
TL;DR Summary

Meta has unveiled Voicebox, its generative text-to-speech model that promises to do for the spoken word what ChatGPT and Dall-E, respectfully, did for text and image generation. The system was trained on more than 50,000 hours of unfiltered audio and can generate more conversational sounding speech, regardless of the languages spoken by each party. Voicebox is reportedly capable of actively editing audio clips, eliminating noise from the speech and even replacing misspoken words. Meta's AI reportedly outperformed the current state of the art both in intelligibility and "audio similarity" while operating as much as 20 times faster than today's best TTS systems. However, neither the Voicebox app nor its source code is being released to the public at this time.

Share this article

Reading Insights

Total Reads

0

Unique Readers

0

Time Saved

2 min

vs 3 min read

Condensed

78%

550120 words

Want the full story? Read the original article

Read on Engadget