OpenAI's ChatGPT Enhances AI Capabilities to Speak, Listen, and Process Images

September 25, 2023 at 06:38 PM

•

1 min read

OpenAI's ChatGPT Enhances AI Capabilities to Speak, Listen, and Process Images — Photo: Ars Technica

TL;DR Summary

OpenAI has announced an update to its ChatGPT AI models, enabling them to analyze images and engage in verbal conversations. The image recognition feature allows users to upload images for conversation, while the speech synthesis feature enables back-and-forth spoken interactions with ChatGPT. OpenAI plans to roll out these features to Plus and Enterprise subscribers in the next two weeks. The company claims that image recognition can be used for various everyday applications, and users can circle parts of the image to focus on. Technical details of the multimodal functionality are not disclosed, but it is speculated that OpenAI may use CLIP to align image and text representations. The voice synthesis feature offers five synthetic voices and is supported by OpenAI's Whisper speech recognition system.

Topics:top-news #ai-models #artificial-intelligence #chatgpt #image-recognition #openai #speech-synthesis

Share this article

ChatGPT update enables its AI to “see, hear, and speak,“ according to OpenAI Ars Technica
ChatGPT Can Now Chat Aloud With You (And Yes, It Sounds Pretty Much Human) The Wall Street Journal
ChatGPT can now 'speak,' listen and process images, OpenAI says CNBC
ChatGPT Can Now Respond With Spoken Words The New York Times
OpenAI’s ChatGPT chatbot now supports prompting with voice and images The Verge

Reading Insights

Total Reads

Unique Readers

Time Saved

2 min

vs 3 min read

Condensed

78%

576 → 124 words

Want the full story? Read the original article

Read on Ars Technica

JavaScript Required

tl;dr daily news requires JavaScript to be enabled. Please enable JavaScript in your browser settings.

Related Sources

Reading Insights