OpenAI's new GPT-4o vision feature in ChatGPT demonstrates impressive capabilities in image recognition, object detection, and scene understanding, outperforming previous models by accurately describing images and detecting AI-generated content. The model's multimodal nature allows it to reason across various media types, showcasing its potential for future applications in smart glasses and other technologies.
A tech journalist tested AI models' ability to recognize ingredients and create recipes from a photo of random ingredients. ChatGPT performed best at recognizing the ingredients, while Claude had the better recipes. The AI models came up with various recipes using the given ingredients, with ChatGPT excelling at understanding the image, and Claude providing more appealing recipes.
Google has updated its Lens feature with new AI tools, allowing users to take a photo and ask questions about it, such as identifying a board game or a dish at a restaurant. The update enables users to ask specific questions about the photo without needing to describe its contents. The feature is available on the Google app for Android and iOS, and users can upload images from their phones to use the tool.
Swarovski Optik has unveiled the AX Visio, the world's first smart binoculars equipped with an NPU for powerful image recognition, allowing for quick and reliable identification of various animal species. The binoculars offer 10x magnification, a 32mm objective lens, and the ability to take photos and videos, with a resolution of 13MP for photos and Full HD for videos. The accompanying smartphone app enables easy photo management and live sharing of observations, while the binoculars also feature a compass function and up to 15 hours of battery life. Priced at $4799, the AX Visio represents a significant advancement in animal observation technology.
Microsoft is introducing a new "Add a screenshot" feature to Windows 11 Copilot, allowing users to capture the screen and ask the AI to explain it. This feature is being rolled out to the general public, enabling users to upload screenshots directly to Copilot or Bing and ask Bing Chat to discuss them. Additionally, select users will have access to ChatGPT-4 Turbo, allowing them to ask Copilot to explain emotions in abstract pictures. Microsoft plans to expand the rollout in the coming weeks and aims to integrate Copilot as a central feature across its products, including Office and Windows, with plans to add a dedicated Copilot button to Windows hardware in 2024.
Meta has launched a free AI image generator that requires users to have a Meta account. The AI image generator site allows users to describe an image for Meta AI to generate, but the quality of the generated images has been criticized, with many appearing like poorly photoshopped creations. Meta used public images from Facebook and Instagram to train the AI, and while the company adds a visible watermark to the images, it can easily be removed.
OpenAI has introduced new features to its chatbot, ChatGPT, allowing users to upload images and ask questions based on them. The image feature can generate recipe suggestions, transcribe handwritten notes, identify objects, and even write code from whiteboard instructions. However, the feature has privacy guardrails in place to prevent the identification of humans. While the new capabilities offer practical uses, concerns about privacy and the potential misuse of personal photos have been raised.
OpenAI's ChatGPT now has a new feature called GPT-4V, which allows the AI chatbot to read and respond to image prompts. Users can upload images to the ChatGPT app and ask questions related to the image. The system uses reinforcement learning from human feedback to generate responses. While GPT-4V shows promise, there are still concerns about its accuracy and potential biases. Users have already started experimenting with the feature, using it for tasks like getting a second opinion on artwork, identifying obscure images, writing code, and interpreting diagrams. OpenAI is also investing in improving its Dall-E image generator and plans to integrate it into ChatGPT.
OpenAI is adding voice and image capabilities to its ChatGPT platform, allowing users to have voice conversations with the chatbot and upload images for analysis. While some users have celebrated the update, others have raised concerns about the potential for AI becoming too human-like, copyright violations, data leaks, and the replacement of smaller AI startups and educators. There are also concerns about the threat of deepfakes, voice scams, identity theft, and the bypassing of image verification CAPTCHA tests. OpenAI has acknowledged these risks and is taking measures to address them.
OpenAI's language model, ChatGPT, has introduced voice and image recognition capabilities, allowing users to interact with the AI through audible inquiries and photo uploads. ChatGPT's voice is natural and calm, resembling that of a virtual assistant, and it can read responses aloud. The image recognition feature enables users to upload photos and ask questions, with ChatGPT recognizing the subject and providing relevant information. These new features are available to ChatGPT Plus and Enterprise members, with pricing starting at $20/month.
OpenAI has announced an update to its ChatGPT AI models, enabling them to analyze images and engage in verbal conversations. The image recognition feature allows users to upload images for conversation, while the speech synthesis feature enables back-and-forth spoken interactions with ChatGPT. OpenAI plans to roll out these features to Plus and Enterprise subscribers in the next two weeks. The company claims that image recognition can be used for various everyday applications, and users can circle parts of the image to focus on. Technical details of the multimodal functionality are not disclosed, but it is speculated that OpenAI may use CLIP to align image and text representations. The voice synthesis feature offers five synthetic voices and is supported by OpenAI's Whisper speech recognition system.
Researchers at MIT's Computer Science and Artificial Intelligence Laboratory (CSAIL) have developed a system called Masked Generative Encoder (MAGE) that combines image recognition and generation capabilities. MAGE uses semantic tokens to represent images and employs a masked token modeling technique to fill in missing parts of an image, enabling it to understand image patterns and generate new images. The system has potential applications in object identification, image classification, few-shot learning, and image editing. MAGE achieved impressive results in generating realistic images and recognition tasks, setting new records in both areas. However, the researchers acknowledge that there is still room for improvement, particularly in compressing images without losing important details.
Optic's free web-based app, "AI or Not," claims to identify images generated by artificial intelligence (AI) by uploading them or providing a URL. The platform uses advanced algorithms and machine learning techniques to analyze images and detect signs of AI generation. While it was successful in identifying some AI-generated images, it struggled with others, indicating that it still has room for improvement. Optic positions its service as a tool to help users identify AI-generated images to avoid issues such as fraud or misinformation.
Google Lens has updated its image recognition feature to identify skin conditions by searching for visually similar images. However, the app warns that search results are informational only and not a diagnosis, and users should consult a medical authority for advice. The feature also works for other body conditions, such as hair loss or bumps on the lip. Google Lens also offers other capabilities, including animal, plant, and landmark identification, automatic translation overlays, homework help, and product suggestions based on image similarity.
Meta has released an artificial intelligence model called Segment Anything Model (SAM) that can identify individual objects within images and videos, even if it has not encountered them before. SAM can select objects by clicking on them or writing text prompts. Meta has also released a dataset of image annotations, which it claims is the largest of its kind. The SAM model and dataset will be available for download under a non-commercial license.