OpenAI's ChatGPT Enhances AI Capabilities to Speak, Listen, and Process Images

1 min read
Source: Ars Technica
OpenAI's ChatGPT Enhances AI Capabilities to Speak, Listen, and Process Images
Photo: Ars Technica
TL;DR Summary

OpenAI has announced an update to its ChatGPT AI models, enabling them to analyze images and engage in verbal conversations. The image recognition feature allows users to upload images for conversation, while the speech synthesis feature enables back-and-forth spoken interactions with ChatGPT. OpenAI plans to roll out these features to Plus and Enterprise subscribers in the next two weeks. The company claims that image recognition can be used for various everyday applications, and users can circle parts of the image to focus on. Technical details of the multimodal functionality are not disclosed, but it is speculated that OpenAI may use CLIP to align image and text representations. The voice synthesis feature offers five synthetic voices and is supported by OpenAI's Whisper speech recognition system.

Share this article

Reading Insights

Total Reads

0

Unique Readers

1

Time Saved

2 min

vs 3 min read

Condensed

78%

576124 words

Want the full story? Read the original article

Read on Ars Technica