OpenAI's ChatGPT Enhances AI Capabilities to Speak, Listen, and Process Images

OpenAI has announced an update to its ChatGPT AI models, enabling them to analyze images and engage in verbal conversations. The image recognition feature allows users to upload images for conversation, while the speech synthesis feature enables back-and-forth spoken interactions with ChatGPT. OpenAI plans to roll out these features to Plus and Enterprise subscribers in the next two weeks. The company claims that image recognition can be used for various everyday applications, and users can circle parts of the image to focus on. Technical details of the multimodal functionality are not disclosed, but it is speculated that OpenAI may use CLIP to align image and text representations. The voice synthesis feature offers five synthetic voices and is supported by OpenAI's Whisper speech recognition system.
- ChatGPT update enables its AI to “see, hear, and speak,“ according to OpenAI Ars Technica
- ChatGPT Can Now Chat Aloud With You (And Yes, It Sounds Pretty Much Human) The Wall Street Journal
- ChatGPT can now 'speak,' listen and process images, OpenAI says CNBC
- ChatGPT Can Now Respond With Spoken Words The New York Times
- OpenAI’s ChatGPT chatbot now supports prompting with voice and images The Verge
Reading Insights
0
1
2 min
vs 3 min read
78%
576 → 124 words
Want the full story? Read the original article
Read on Ars Technica