Tag

Multimodal

All articles tagged with #multimodal

Apple Develops Unified AI Model for Vision, Creation, and Editing

Originally Published 24 days ago — by 9to5Mac

Featured image for Apple Develops Unified AI Model for Vision, Creation, and Editing
Source: 9to5Mac

Apple researchers have developed UniGen-1.5, a unified AI model capable of understanding, generating, and editing images within a single system, enhancing previous models with new editing capabilities and improved instruction alignment, achieving state-of-the-art performance on various benchmarks, though it still faces challenges with text generation and identity consistency.

Apple Unveils 2025 Foundation Language Models Report

Originally Published 5 months ago — by Apple Machine Learning Research

Featured image for Apple Unveils 2025 Foundation Language Models Report
Source: Apple Machine Learning Research

Apple has developed two advanced multilingual, multimodal foundation language models: a 3-billion-parameter on-device model optimized for Apple silicon and a scalable server model using a novel PT-MoE transformer, both supporting multiple languages and image understanding. These models power Apple Intelligence features across devices and services, with a focus on responsible AI, privacy, and developer integration through a new Swift-based framework. They outperform comparable open models in benchmarks and human evaluations, enhancing user experiences with efficient, accurate, and responsible AI capabilities.

Google AI Mode Launches in India as First International Expansion

Originally Published 6 months ago — by 9to5Google

Featured image for Google AI Mode Launches in India as First International Expansion
Source: 9to5Google

Google is expanding its AI Mode feature to India, marking its first international launch after debuting in the US. Powered by Gemini 2.5, AI Mode enhances search with advanced reasoning, multimodal inputs including voice and images, and real-time data sources. Users in India can now access AI Mode via the Google app or web, enabling more interactive and detailed search experiences.

Google's Gemini AI: Advancements, Features, and Future in Robotics and Assistant Technology

Originally Published 7 months ago — by Tom's Guide

Featured image for Google's Gemini AI: Advancements, Features, and Future in Robotics and Assistant Technology
Source: Tom's Guide

While ChatGPT remains popular, Google’s Gemini outperforms it in deep research, integration with Google Workspace, and real-time web access, making it a preferred tool for users deeply embedded in the Google ecosystem, though both AI tools are valuable for different tasks.

Mistral Unveils Devstral and Agent Frameworks for AI Coding and Enterprise Solutions

Originally Published 7 months ago — by VentureBeat

Mistral AI has launched an API enabling developers to create customizable AI agents capable of tasks like code execution, image generation, and web search, with features supporting complex workflows and real-time interactions, aimed at enterprise and developer use. The API enhances AI capabilities beyond traditional language models by integrating real-world data sources and managing multiple agents, positioning Mistral as a key player in enterprise AI solutions. However, the proprietary nature of the models and API may influence adoption decisions.

Google's Gemini 2.0: Pioneering the Agentic AI Era

Originally Published 1 year ago — by Ars Technica

Featured image for Google's Gemini 2.0: Pioneering the Agentic AI Era
Source: Ars Technica

Google has launched Gemini 2.0, an advanced AI model capable of generating text, images, and speech while processing various input types. The Gemini 2.0 Flash model, part of this new family, offers enhanced performance and speed compared to its predecessor. Initially available to developers, its full features will be accessible to early access partners by January 2025. Google is integrating this technology into its products and has implemented SynthID watermarking to prevent misuse of AI-generated content. The company emphasizes the development of "agentic" AI systems that can autonomously perform tasks with user supervision.

Google's Gemini 2.0: Pioneering AI with Text, Image, and Speech Generation

Originally Published 1 year ago — by TechCrunch

Featured image for Google's Gemini 2.0: Pioneering AI with Text, Image, and Speech Generation
Source: TechCrunch

Google has unveiled Gemini 2.0 Flash, its latest AI model capable of generating text, images, and audio, and interacting with third-party apps. The model, which is twice as fast as its predecessor, will initially be available to early access partners, with a broader release planned for January. It features enhanced capabilities in coding and image analysis, and uses SynthID technology to watermark outputs to prevent misuse. Google is also launching the Multimodal Live API for developers to create real-time apps with audio and video streaming.

OpenAI Hires Leading Engineers from DeepMind

Originally Published 1 year ago — by WIRED

Featured image for OpenAI Hires Leading Engineers from DeepMind
Source: WIRED

OpenAI has hired three senior engineers from Google DeepMind to work on multimodal AI at its new Zurich office. The hires, Lucas Beyer, Alexander Kolesnikov, and Xiaohua Zhai, reflect the intense competition among AI companies to secure top talent. OpenAI, known for its advancements in multimodal AI, is expanding globally with new offices planned in several cities. The move comes amid a broader trend of high-profile talent shifts in the AI industry, as companies like Microsoft and Google also engage in aggressive recruitment strategies.

"Anthropic Unveils Superior AI Chatbot to Rival OpenAI and Google"

Originally Published 1 year ago — by The Verge

Featured image for "Anthropic Unveils Superior AI Chatbot to Rival OpenAI and Google"
Source: The Verge

Anthropic, founded by former OpenAI employees, introduces the Claude 3 family of AI models, claiming they outperform leading models from Google and OpenAI and are multimodal, capable of processing text and photo inputs. The models, including Haiku, Sonnet, and Opus, offer improved contextual understanding, faster processing, and better performance in benchmarking tests compared to previous versions and competitors. Trained on a mix of datasets, Claude 3 will be available on AWS's Bedrock and Google's Vertex AI.

"Anthropic Unveils Cutting-Edge Chatbot in Escalating Generative AI Competition"

Originally Published 1 year ago — by CNBC

Featured image for "Anthropic Unveils Cutting-Edge Chatbot in Escalating Generative AI Competition"
Source: CNBC

Anthropic, backed by Google, has launched Claude 3, a suite of AI models including Opus, Sonnet, and Haiku, with Opus outperforming OpenAI's GPT-4 and Google's Gemini Ultra. The new models offer multimodal support, allowing users to upload various types of data for analysis. Anthropic has rapidly grown in the generative AI field, securing significant funding and competing with ChatGPT. The field has seen a surge in investment and adoption, despite concerns about bias propagation. Claude 3 can summarize up to 200,000 words and has improved risk understanding in responses. Multimodality in AI presents new opportunities and risks, as seen with Google's AI image generator being taken offline due to historical inaccuracies and questionable responses.

"Google's Gemini AI: The Latest Advancements and Applications Unveiled"

Originally Published 1 year ago — by TechCrunch

Featured image for "Google's Gemini AI: The Latest Advancements and Applications Unveiled"
Source: TechCrunch

Google has introduced Gemini, a suite of generative AI models, apps, and services developed by Google’s AI research labs DeepMind and Google Research. Gemini comes in three flavors: Ultra, Pro, and Nano, and is trained to be natively multimodal, capable of working with audio, images, videos, and text in different languages. While promising, Gemini has faced criticism for underdelivering in some areas. It is available through various platforms like Vertex AI, AI Studio, and will be integrated into devices like the Pixel 8 Pro. Gemini Pro is free to use in the Gemini apps and AI Studio for now, but once it exits preview in Vertex AI, it will have a cost associated with its usage.

"Unveiling Google Gemini: The Next Generation AI Platform"

Originally Published 2 years ago — by TechCrunch

Featured image for "Unveiling Google Gemini: The Next Generation AI Platform"
Source: TechCrunch

Google has introduced Gemini, a new generative AI platform developed by DeepMind and Google Research, which comes in three models: Ultra, Pro, and Nano. Gemini is multimodal, capable of working with text, audio, images, and videos. It is distinct from Bard, which is an interface for accessing certain Gemini models. Gemini's capabilities include transcribing speech, captioning images and videos, and generating artwork. Gemini Pro is available publicly and can be accessed through Bard, Vertex AI, and AI Studio. Gemini Nano is a smaller version designed to run on mobile devices and is currently available on the Pixel 8 Pro. Google claims Gemini's superiority over OpenAI's GPT-4, but early impressions have raised concerns about its performance. Gemini Pro will be free to use in Bard and AI Studio, but will have a cost in Vertex AI once it exits preview.

"Google Unveils Gemini: The AI Model Set to Outperform GPT-4"

Originally Published 2 years ago — by Ars Technica

Featured image for "Google Unveils Gemini: The AI Model Set to Outperform GPT-4"
Source: Ars Technica

Google has unveiled Gemini, a multimodal AI model family aimed at rivaling OpenAI's GPT-4. Google claims that the largest version of Gemini surpasses current state-of-the-art results on 30 out of 32 widely used academic benchmarks. Gemini can handle multiple types of input, including text, code, images, and audio, making it a versatile AI model. Google plans to integrate Gemini into its products and believes it will revolutionize computing. The model is available in three sizes, with the mid-level model currently accessible to the public. Google also highlights Gemini's scalability and efficiency when running on its custom Tensor Processing Units (TPU).

"Google's Gemini: The Next Generation AI Model Set to Revolutionize Search and Defeat GPT-4"

Originally Published 2 years ago — by WIRED

Featured image for "Google's Gemini: The Next Generation AI Model Set to Revolutionize Search and Defeat GPT-4"
Source: WIRED

Google's AI division, DeepMind, has announced the launch of Gemini, a new AI model that aims to revolutionize the field of artificial intelligence. Gemini is described as a "multimodal" model, capable of processing information in the form of text, audio, images, and video. It represents a significant step forward in creating AI models inspired by the way humans interact and understand the world through their senses. Gemini outperforms GPT-4, the model behind OpenAI's ChatGPT, on several benchmarks and showcases complex reasoning and the ability to combine information from different modalities. Google is also exploring the integration of Gemini with robotics to enable physical interaction with the world. The company is focused on advancing the reasoning abilities of AI models and is working on safety and responsibility tests for the upcoming release of the most powerful version of Gemini, Ultra.

Google's Bard AI chatbot now widely available with new features.

Originally Published 2 years ago — by TechCrunch

Featured image for Google's Bard AI chatbot now widely available with new features.
Source: TechCrunch

Google has removed most waitlist restrictions and made its generative AI chatbot, Bard, widely available in English across 180 countries and territories. The company plans to expand to the top 40 languages soon and is adding multimodal content to Bard, allowing it to deliver answers in more than just text. Bard is an experiment that will answer questions in natural language, and Google is being responsible about its development.