Tag

Multimodal Capabilities

All articles tagged with #multimodal capabilities

technology1 year ago•12 min saved

Google Launches Gemini 2.0 AI Model for Next-Gen Agents

Google DeepMind has launched Gemini 2.0, an advanced AI model designed for the "agentic era," featuring enhanced multimodal capabilities, including native image and audio output and tool use. The Gemini 2.0 Flash model is currently available to developers and trusted testers, with broader access planned for early next year. Google is exploring new agentic experiences with projects like Astra, Mariner, and Jules, while emphasizing responsible AI development with a focus on safety and security.

via The Keyword|

#agentic-era #ai #gemini-20

technology2 years ago•1 min saved

"ChatGPT's New 'Read Aloud' Feature: Enhancing Accessibility and User Experience"

OpenAI's ChatGPT introduces a new Read Aloud feature that can read its responses in five voice options, available on web and mobile apps, speaking 37 languages and auto-detecting text language. This feature is available for both GPT-4 and GPT-3.5, showcasing OpenAI's multimodal capabilities. Users can now have ChatGPT read written answers aloud and set it to always respond verbally, with mobile app users able to control the readout through a player interface and web users seeing a speaker icon below the text.

via The Verge|

#ai #chatgpt #multimodal-capabilities