Tag

Ai Training Data

All articles tagged with #ai training data

ElevenLabs Debuts AI Music Generator in Industry Collaboration

Originally Published 5 months ago — by TechCrunch

Featured image for ElevenLabs Debuts AI Music Generator in Industry Collaboration
Source: TechCrunch

ElevenLabs has launched an AI music generator that is claimed to be cleared for commercial use, expanding beyond its traditional text-to-speech tools. The company has shared samples of AI-generated music and announced partnerships with major music publishers to use their material for training, amid ongoing legal concerns about copyright infringement in AI music development.

US Authors Prepare Class-Action Lawsuit Against Anthropic Over AI Book Piracy

Originally Published 5 months ago — by The Verge

Featured image for US Authors Prepare Class-Action Lawsuit Against Anthropic Over AI Book Piracy
Source: The Verge

A California federal judge has allowed a class-action lawsuit against Anthropic, alleging the AI company downloaded up to seven million copyrighted books from pirated sources to train its chatbot Claude, violating the Copyright Act. The lawsuit, filed by authors including Andrea Bartz, Charles Graeber, and Kirk Wallace Johnson, joins a broader trend of legal actions against AI firms over copyright issues, with some cases focusing on unauthorized data use and others on licensing disputes.

Reddit Sues Anthropic Over Data Use and Bot Access

Originally Published 7 months ago — by The Verge

Featured image for Reddit Sues Anthropic Over Data Use and Bot Access
Source: The Verge

Reddit has sued Anthropic in San Francisco, alleging that Anthropic accessed Reddit over 100,000 times since July 2024 despite claims of blocking its bots, and accuses the company of exploiting Reddit's content for AI training, amidst a broader trend of legal actions against AI firms for copyright violations.

"Report: Tumblr and WordPress Strike Deals to Sell User Data for AI Training"

Originally Published 1 year ago — by The Verge

Featured image for "Report: Tumblr and WordPress Strike Deals to Sell User Data for AI Training"
Source: The Verge

Automattic, the owner of Tumblr and WordPress.com, is reportedly in talks with AI companies Midjourney and OpenAI to provide training data scraped from users' posts, potentially creating a new revenue stream for the site. The company plans to launch a new setting allowing users to opt out of data sharing with third parties, including AI companies. However, it's unclear what data has been sent to the AI companies and how it has been used. This move reflects a trend of companies striking deals with AI tool makers for training data, but it has also sparked concerns from the creative community about their work being used for training. Automattic has struggled to monetize Tumblr since acquiring it in 2019 and is seeking new avenues for revenue.

OpenAI Accuses New York Times of Hacking ChatGPT for Lawsuit Evidence

Originally Published 1 year ago — by CNBC

Featured image for OpenAI Accuses New York Times of Hacking ChatGPT for Lawsuit Evidence
Source: CNBC

OpenAI alleges that The New York Times "hacked" its ChatGPT to generate copyright infringement examples for a lawsuit, claiming the media company used deceptive prompts violating OpenAI's terms of use. The filing comes amid a broader battle over using copyrighted material for AI training data, with OpenAI arguing it's "impossible" to train top AI models without copyrighted works. The company has been courting publishers for content partnerships and highlighted its opt-out process for publishers while actively engaging with them to gain access to materials for AI training.

"Reddit and Google Strike $60M AI Training Data Deal"

Originally Published 1 year ago — by The Verge

Featured image for "Reddit and Google Strike $60M AI Training Data Deal"
Source: The Verge

Google has partnered with Reddit to access the platform's data API for AI training, allowing the tech giant to efficiently train models and display Reddit content across its products. The collaboration also grants Reddit access to Google's Vertex AI for improving search results. Despite previous tensions, the deal signifies a potential revenue stream for Reddit, which is preparing for an IPO and seeking to boost its valuation.

OpenAI Collaborates with Organizations to Enhance AI Training Data

Originally Published 2 years ago — by TechCrunch

Featured image for OpenAI Collaborates with Organizations to Enhance AI Training Data
Source: TechCrunch

OpenAI has announced its Data Partnerships program, aiming to collaborate with third-party organizations to develop new data sets for training AI models. The initiative seeks to address the flaws and biases present in existing data sets, which can lead to harmful amplification by AI models. OpenAI plans to collect large-scale data sets that reflect human society and encompass various modalities, including images, audio, and video. The company is particularly interested in data that expresses human intention across different languages, topics, and formats. OpenAI will work with organizations to digitize training data and create both open source and private data sets. While the program aims to improve AI model understanding, concerns have been raised about potential bias and compensation for data owners.

"Google Empowers Publishers with Opt-Out Switch for AI Training Data"

Originally Published 2 years ago — by The Verge

Featured image for "Google Empowers Publishers with Opt-Out Switch for AI Training Data"
Source: The Verge

Google has introduced a new tool called Google-Extended, which allows website publishers to opt out of having their data used to train the company's AI models while still being accessible through Google Search. The tool, available through robots.txt, enables publishers to control access to their content and manage whether their sites contribute to improving AI applications. This move comes as some sites have already blocked OpenAI's web crawler to prevent data scraping for AI training, and concerns have arisen about blocking Google's crawlers without affecting search indexing.