Tag

Multimodal

All articles tagged with #multimodal

technology9 days ago•76 min saved

Gemini 3.1 Pro powers deep work with 7 actionable prompts

Google's Gemini 3.1 Pro is pitched as a versatile, multi-modal AI capable of long-context, multi-step reasoning. The article tests its capabilities and presents seven concrete prompts—long-document extraction, logic stress testing, code architecture, getting unstuck fast, contextual video analysis, synthetic data generation, and deep research synthesis—to turn the model into an execution engine for productive, deep work.

via Tom's Guide|

#ai #deep-work #gemini-3-1-pro

technology2 months ago•3 min saved

Apple Develops Unified AI Model for Vision, Creation, and Editing

Apple researchers have developed UniGen-1.5, a unified AI model capable of understanding, generating, and editing images within a single system, enhancing previous models with new editing capabilities and improved instruction alignment, achieving state-of-the-art performance on various benchmarks, though it still faces challenges with text generation and identity consistency.

via 9to5Mac|

#ai-model #apple #image-editing

technology7 months ago•2 min saved

Apple Unveils 2025 Foundation Language Models Report

Apple has developed two advanced multilingual, multimodal foundation language models: a 3-billion-parameter on-device model optimized for Apple silicon and a scalable server model using a novel PT-MoE transformer, both supporting multiple languages and image understanding. These models power Apple Intelligence features across devices and services, with a focus on responsible AI, privacy, and developer integration through a new Swift-based framework. They outperform comparable open models in benchmarks and human evaluations, enhancing user experiences with efficient, accurate, and responsible AI capabilities.

via Apple Machine Learning Research|

#ai #apple #language-models

technology8 months ago•2 min saved

Google AI Mode Launches in India as First International Expansion

Google is expanding its AI Mode feature to India, marking its first international launch after debuting in the US. Powered by Gemini 2.5, AI Mode enhances search with advanced reasoning, multimodal inputs including voice and images, and real-time data sources. Users in India can now access AI Mode via the Google app or web, enabling more interactive and detailed search experiences.

via 9to5Google|

#ai-mode #ai-search #google

technology9 months ago•3 min saved

Google's Gemini AI: Advancements, Features, and Future in Robotics and Assistant Technology

While ChatGPT remains popular, Google’s Gemini outperforms it in deep research, integration with Google Workspace, and real-time web access, making it a preferred tool for users deeply embedded in the Google ecosystem, though both AI tools are valuable for different tasks.

via Tom's Guide|

#ai #chatgpt #google-gemini

technology9 months ago•7 min saved

Mistral Unveils Devstral and Agent Frameworks for AI Coding and Enterprise Solutions

Mistral AI has launched an API enabling developers to create customizable AI agents capable of tasks like code execution, image generation, and web search, with features supporting complex workflows and real-time interactions, aimed at enterprise and developer use. The API enhances AI capabilities beyond traditional language models by integrating real-world data sources and managing multiple agents, positioning Mistral as a key player in enterprise AI solutions. However, the proprietary nature of the models and API may influence adoption decisions.

via VentureBeat|

#api #enterprise #generative-ai

technology1 year ago•1 min saved

Google's Gemini 2.0: Pioneering the Agentic AI Era

Google has launched Gemini 2.0, an advanced AI model capable of generating text, images, and speech while processing various input types. The Gemini 2.0 Flash model, part of this new family, offers enhanced performance and speed compared to its predecessor. Initially available to developers, its full features will be accessible to early access partners by January 2025. Google is integrating this technology into its products and has implemented SynthID watermarking to prevent misuse of AI-generated content. The company emphasizes the development of "agentic" AI systems that can autonomously perform tasks with user supervision.

via Ars Technica|

#agentic-ai #ai #gemini-20

technology1 year ago•3 min saved

Google's Gemini 2.0: Pioneering AI with Text, Image, and Speech Generation

Google has unveiled Gemini 2.0 Flash, its latest AI model capable of generating text, images, and audio, and interacting with third-party apps. The model, which is twice as fast as its predecessor, will initially be available to early access partners, with a broader release planned for January. It features enhanced capabilities in coding and image analysis, and uses SynthID technology to watermark outputs to prevent misuse. Google is also launching the Multimodal Live API for developers to create real-time apps with audio and video streaming.

via TechCrunch|

#ai #deepfakes #gemini-20

technology1 year ago•3 min saved

OpenAI Hires Leading Engineers from DeepMind

OpenAI has hired three senior engineers from Google DeepMind to work on multimodal AI at its new Zurich office. The hires, Lucas Beyer, Alexander Kolesnikov, and Xiaohua Zhai, reflect the intense competition among AI companies to secure top talent. OpenAI, known for its advancements in multimodal AI, is expanding globally with new offices planned in several cities. The move comes amid a broader trend of high-profile talent shifts in the AI industry, as companies like Microsoft and Google also engage in aggressive recruitment strategies.

via WIRED|

#ai #deepmind #hiring

artificial-intelligence2 years ago•2 min saved

"Anthropic Unveils Superior AI Chatbot to Rival OpenAI and Google"

Anthropic, founded by former OpenAI employees, introduces the Claude 3 family of AI models, claiming they outperform leading models from Google and OpenAI and are multimodal, capable of processing text and photo inputs. The models, including Haiku, Sonnet, and Opus, offer improved contextual understanding, faster processing, and better performance in benchmarking tests compared to previous versions and competitors. Trained on a mix of datasets, Claude 3 will be available on AWS's Bedrock and Google's Vertex AI.

via The Verge|

#ai-models #anthropic #artificial-intelligence

technology2 years ago•3 min saved

"Anthropic Unveils Cutting-Edge Chatbot in Escalating Generative AI Competition"

Anthropic, backed by Google, has launched Claude 3, a suite of AI models including Opus, Sonnet, and Haiku, with Opus outperforming OpenAI's GPT-4 and Google's Gemini Ultra. The new models offer multimodal support, allowing users to upload various types of data for analysis. Anthropic has rapidly grown in the generative AI field, securing significant funding and competing with ChatGPT. The field has seen a surge in investment and adoption, despite concerns about bias propagation. Claude 3 can summarize up to 200,000 words and has improved risk understanding in responses. Multimodality in AI presents new opportunities and risks, as seen with Google's AI image generator being taken offline due to historical inaccuracies and questionable responses.

via CNBC|

#ai #anthropic #claude-3

technology2 years ago•8 min saved

"Google's Gemini AI: The Latest Advancements and Applications Unveiled"

Google has introduced Gemini, a suite of generative AI models, apps, and services developed by Google’s AI research labs DeepMind and Google Research. Gemini comes in three flavors: Ultra, Pro, and Nano, and is trained to be natively multimodal, capable of working with audio, images, videos, and text in different languages. While promising, Gemini has faced criticism for underdelivering in some areas. It is available through various platforms like Vertex AI, AI Studio, and will be integrated into devices like the Pixel 8 Pro. Gemini Pro is free to use in the Gemini apps and AI Studio for now, but once it exits preview in Vertex AI, it will have a cost associated with its usage.

via TechCrunch|

#ai #gemini #generative-ai

technology2 years ago•8 min saved

"Unveiling Google Gemini: The Next Generation AI Platform"

Google has introduced Gemini, a new generative AI platform developed by DeepMind and Google Research, which comes in three models: Ultra, Pro, and Nano. Gemini is multimodal, capable of working with text, audio, images, and videos. It is distinct from Bard, which is an interface for accessing certain Gemini models. Gemini's capabilities include transcribing speech, captioning images and videos, and generating artwork. Gemini Pro is available publicly and can be accessed through Bard, Vertex AI, and AI Studio. Gemini Nano is a smaller version designed to run on mobile devices and is currently available on the Pixel 8 Pro. Google claims Gemini's superiority over OpenAI's GPT-4, but early impressions have raised concerns about its performance. Gemini Pro will be free to use in Bard and AI Studio, but will have a cost in Vertex AI once it exits preview.

via TechCrunch|

#ai #gemini #generative-ai

technology2 years ago•2 min saved

"Google Unveils Gemini: The AI Model Set to Outperform GPT-4"

Google has unveiled Gemini, a multimodal AI model family aimed at rivaling OpenAI's GPT-4. Google claims that the largest version of Gemini surpasses current state-of-the-art results on 30 out of 32 widely used academic benchmarks. Gemini can handle multiple types of input, including text, code, images, and audio, making it a versatile AI model. Google plans to integrate Gemini into its products and believes it will revolutionize computing. The model is available in three sizes, with the mid-level model currently accessible to the public. Google also highlights Gemini's scalability and efficiency when running on its custom Tensor Processing Units (TPU).

via Ars Technica|

#ai-model #gemini #google

artificial-intelligence2 years ago•6 min saved

"Google's Gemini: The Next Generation AI Model Set to Revolutionize Search and Defeat GPT-4"

Google's AI division, DeepMind, has announced the launch of Gemini, a new AI model that aims to revolutionize the field of artificial intelligence. Gemini is described as a "multimodal" model, capable of processing information in the form of text, audio, images, and video. It represents a significant step forward in creating AI models inspired by the way humans interact and understand the world through their senses. Gemini outperforms GPT-4, the model behind OpenAI's ChatGPT, on several benchmarks and showcases complex reasoning and the ability to combine information from different modalities. Google is also exploring the integration of Gemini with robotics to enable physical interaction with the world. The company is focused on advancing the reasoning abilities of AI models and is working on safety and responsibility tests for the upcoming release of the most powerful version of Gemini, Ultra.

via WIRED|

#ai-model #artificial-intelligence #demis-hassabis