Tag

Ai Training Data

All articles tagged with #ai training data

Wikipedia signs paid-data deals with AI firms to fund its infrastructure
technology1 month ago

Wikipedia signs paid-data deals with AI firms to fund its infrastructure

The Wikimedia Foundation has begun paid data-access deals with AI firms including Amazon, Meta, Microsoft, Mistral AI, and Perplexity to monetize Wikipedia’s data and help cover rising infrastructure costs from automated scraping, signaling a shift from donation-based funding to enterprise partnerships; the foundation also envisions AI tools to assist editors and a conversational search experience that cites verified text.

Wikipedia opens paid licensing for AI training with major tech firms
technology1 month ago

Wikipedia opens paid licensing for AI training with major tech firms

The Wikimedia Foundation expanded Wikimedia Enterprise to offer paid licensing of Wikipedia content to major tech firms—Microsoft, Meta, Amazon, Perplexity, and Mistral AI—letting them use Wikipedia’s 65 million articles to train AI models, joining Google's existing deal. The move monetizes API access to offset rising infrastructure costs as AI scrapes increase, while editors push back on AI-generated content and bot traffic remains a concern.

ElevenLabs Debuts AI Music Generator in Industry Collaboration
technology6 months ago

ElevenLabs Debuts AI Music Generator in Industry Collaboration

ElevenLabs has launched an AI music generator that is claimed to be cleared for commercial use, expanding beyond its traditional text-to-speech tools. The company has shared samples of AI-generated music and announced partnerships with major music publishers to use their material for training, amid ongoing legal concerns about copyright infringement in AI music development.

US Authors Prepare Class-Action Lawsuit Against Anthropic Over AI Book Piracy
legal7 months ago

US Authors Prepare Class-Action Lawsuit Against Anthropic Over AI Book Piracy

A California federal judge has allowed a class-action lawsuit against Anthropic, alleging the AI company downloaded up to seven million copyrighted books from pirated sources to train its chatbot Claude, violating the Copyright Act. The lawsuit, filed by authors including Andrea Bartz, Charles Graeber, and Kirk Wallace Johnson, joins a broader trend of legal actions against AI firms over copyright issues, with some cases focusing on unauthorized data use and others on licensing disputes.

"Report: Tumblr and WordPress Strike Deals to Sell User Data for AI Training"
technology2 years ago

"Report: Tumblr and WordPress Strike Deals to Sell User Data for AI Training"

Automattic, the owner of Tumblr and WordPress.com, is reportedly in talks with AI companies Midjourney and OpenAI to provide training data scraped from users' posts, potentially creating a new revenue stream for the site. The company plans to launch a new setting allowing users to opt out of data sharing with third parties, including AI companies. However, it's unclear what data has been sent to the AI companies and how it has been used. This move reflects a trend of companies striking deals with AI tool makers for training data, but it has also sparked concerns from the creative community about their work being used for training. Automattic has struggled to monetize Tumblr since acquiring it in 2019 and is seeking new avenues for revenue.

OpenAI Accuses New York Times of Hacking ChatGPT for Lawsuit Evidence
technology2 years ago

OpenAI Accuses New York Times of Hacking ChatGPT for Lawsuit Evidence

OpenAI alleges that The New York Times "hacked" its ChatGPT to generate copyright infringement examples for a lawsuit, claiming the media company used deceptive prompts violating OpenAI's terms of use. The filing comes amid a broader battle over using copyrighted material for AI training data, with OpenAI arguing it's "impossible" to train top AI models without copyrighted works. The company has been courting publishers for content partnerships and highlighted its opt-out process for publishers while actively engaging with them to gain access to materials for AI training.

"Reddit and Google Strike $60M AI Training Data Deal"
technology2 years ago

"Reddit and Google Strike $60M AI Training Data Deal"

Google has partnered with Reddit to access the platform's data API for AI training, allowing the tech giant to efficiently train models and display Reddit content across its products. The collaboration also grants Reddit access to Google's Vertex AI for improving search results. Despite previous tensions, the deal signifies a potential revenue stream for Reddit, which is preparing for an IPO and seeking to boost its valuation.

OpenAI Collaborates with Organizations to Enhance AI Training Data
artificial-intelligence2 years ago

OpenAI Collaborates with Organizations to Enhance AI Training Data

OpenAI has announced its Data Partnerships program, aiming to collaborate with third-party organizations to develop new data sets for training AI models. The initiative seeks to address the flaws and biases present in existing data sets, which can lead to harmful amplification by AI models. OpenAI plans to collect large-scale data sets that reflect human society and encompass various modalities, including images, audio, and video. The company is particularly interested in data that expresses human intention across different languages, topics, and formats. OpenAI will work with organizations to digitize training data and create both open source and private data sets. While the program aims to improve AI model understanding, concerns have been raised about potential bias and compensation for data owners.

"Google Empowers Publishers with Opt-Out Switch for AI Training Data"
technology2 years ago

"Google Empowers Publishers with Opt-Out Switch for AI Training Data"

Google has introduced a new tool called Google-Extended, which allows website publishers to opt out of having their data used to train the company's AI models while still being accessible through Google Search. The tool, available through robots.txt, enables publishers to control access to their content and manage whether their sites contribute to improving AI applications. This move comes as some sites have already blocked OpenAI's web crawler to prevent data scraping for AI training, and concerns have arisen about blocking Google's crawlers without affecting search indexing.