Voicebox AI: The Ultimate Text-to-Speech Solution

June 16, 2023 at 03:02 PM

•

1 min read

Voicebox AI: The Ultimate Text-to-Speech Solution — Photo: Engadget

TL;DR Summary

Meta has unveiled Voicebox, its generative text-to-speech model that promises to do for the spoken word what ChatGPT and Dall-E, respectfully, did for text and image generation. The system was trained on more than 50,000 hours of unfiltered audio and can generate more conversational sounding speech, regardless of the languages spoken by each party. Voicebox is reportedly capable of actively editing audio clips, eliminating noise from the speech and even replacing misspoken words. Meta's AI reportedly outperformed the current state of the art both in intelligibility and "audio similarity" while operating as much as 20 times faster than today's best TTS systems. However, neither the Voicebox app nor its source code is being released to the public at this time.

Topics:technology #ai #flow-matching #generative-models #meta #text-to-speech #voicebox

Share this article

Meta's Voicebox AI is a Dall-E for text-to-speech Engadget
Introducing Voicebox: The Most Versatile AI for Speech Generation Meta Store
View Full Coverage on Google News

Reading Insights

Total Reads

Unique Readers

Time Saved

2 min

vs 3 min read

Condensed

78%

550 → 120 words

Want the full story? Read the original article

Read on Engadget

JavaScript Required

tl;dr daily news requires JavaScript to be enabled. Please enable JavaScript in your browser settings.

Related Sources

Reading Insights