An ethics journal article about whistleblowing in Ethiopia was found to contain at least 19 fabricated references, many generated by AI tools like ChatGPT, raising concerns about the reliability of scientific publications and the need for better verification methods.
The article compares ChatGPT and Google Gemini in their ability to provide accurate information, highlighting ChatGPT's tendency to hallucinate or invent facts, and Gemini's more critical and sarcastic responses that often scold or correct ChatGPT's inaccuracies during fact-checking tests across various topics.
Google has removed its open Gemma AI model from AI Studio following a complaint from Senator Marsha Blackburn, who claimed the model generated false accusations against her. The move appears to be a response to concerns about AI hallucinations and potential misuse, with Google emphasizing ongoing efforts to reduce such errors while restricting non-developer access to prevent inflammatory outputs.
The article discusses the risks of bias, hallucinations, and opacity in AI systems used in health research, highlighting how these issues threaten the reliability and trustworthiness of medical data and findings, and proposing solutions like transparency and better oversight to mitigate these risks.
An OpenAI researcher claimed GPT-5 solved multiple longstanding Erdös problems, but the claim was based on misinterpretations and miscommunications, highlighting issues with AI's actual capabilities versus hype, especially in mathematics and literature search. The incident underscores the importance of cautious claims and understanding AI limitations.
Deloitte announced a major AI deal with Anthropic and plans to deploy their chatbot Claude across its global workforce, signaling a strong commitment to AI despite recent issues such as issuing refunds for a government report containing AI errors and other companies facing similar challenges with AI accuracy.
A new OpenAI study explores why large language models like GPT-5 hallucinate, attributing it partly to training focus on next-word prediction and current evaluation methods that incentivize guessing. The researchers suggest updating evaluation metrics to penalize confident errors and discourage guessing, aiming to reduce hallucinations.
The article discusses new insights and tips for prompt engineering with GPT-5, emphasizing that traditional prompting techniques remain effective despite GPT-5's new features like an auto-switcher, which can complicate model selection. It offers strategies to influence model routing, improve output quality, reduce hallucinations, and utilize personas, reaffirming that prompt engineering remains a vital skill in AI interactions.
The article discusses the concept of the 'Slopocene,' a period characterized by low-quality AI-generated content and failures, which can reveal insights into AI systems' inner workings. It advocates for deliberately 'breaking' AI models to understand their biases, decision processes, and limitations, thereby fostering critical AI literacy and a deeper understanding of these technologies.
Google's new AI Overview feature generates written answers to user searches, raising questions about legal responsibility if the AI provides incorrect or harmful information. The legal protections under Section 230 of the Communications Decency Act, which shield companies from liability for third-party content, may not clearly apply to AI-generated content. The reliability of AI Overview's answers varies, and the feature's impact on the creation and recognition of reliable information is also a concern.
Nvidia CEO Jensen Huang discussed the potential timeline for achieving artificial general intelligence (AGI), suggesting it could be within 5 years if specific tests are defined. He also addressed the issue of AI hallucinations, proposing a solution involving thorough research and fact-checking for generating accurate responses, particularly for mission-critical questions.
OpenAI is developing a new method for training AI models to combat AI "hallucinations," which occur when models fabricate information entirely. The new approach, called "process supervision," trains AI models to reward themselves for each individual, correct step of reasoning when arriving at an answer, instead of just rewarding a correct final conclusion. This could lead to better explainable AI and help address concerns about misinformation and incorrect results. OpenAI has released an accompanying dataset of 800,000 human labels it used to train the model mentioned in the research paper. However, some experts remain skeptical and call for more transparency and accountability in the field of AI.