Tag

Ai Training Data

All articles tagged with #ai training data

technology1 month ago•3 min saved

Wikipedia signs paid-data deals with AI firms to fund its infrastructure

The Wikimedia Foundation has begun paid data-access deals with AI firms including Amazon, Meta, Microsoft, Mistral AI, and Perplexity to monetize Wikipedia’s data and help cover rising infrastructure costs from automated scraping, signaling a shift from donation-based funding to enterprise partnerships; the foundation also envisions AI tools to assist editors and a conversational search experience that cites verified text.

via TechSpot|

#ai-training-data #data-licensing #large-language-models

technology1 month ago•4 min saved

Wikipedia opens paid licensing for AI training with major tech firms

The Wikimedia Foundation expanded Wikimedia Enterprise to offer paid licensing of Wikipedia content to major tech firms—Microsoft, Meta, Amazon, Perplexity, and Mistral AI—letting them use Wikipedia’s 65 million articles to train AI models, joining Google's existing deal. The move monetizes API access to offset rising infrastructure costs as AI scrapes increase, while editors push back on AI-generated content and bot traffic remains a concern.

via Ars Technica|

#ai-training-data #big-tech #bot-scraping

technology4 months ago•1 min saved

World Models: The Future of AI Innovation

Pim de Witte's Medal platform, which hosts billions of gaming videos, is being spun out into a new AI lab called General Intuition with a $133.7 million seed round, aiming to develop world models for AI using gaming data, highlighting the growing importance of foundational models in AI development.

via The Verge|

#ai #ai-training-data #foundational-models

technology6 months ago•2 min saved

ElevenLabs Debuts AI Music Generator in Industry Collaboration

ElevenLabs has launched an AI music generator that is claimed to be cleared for commercial use, expanding beyond its traditional text-to-speech tools. The company has shared samples of AI-generated music and announced partnerships with major music publishers to use their material for training, amid ongoing legal concerns about copyright infringement in AI music development.

via TechCrunch|

#ai-music-generation #ai-training-data #copyright-issues

legal7 months ago•1 min saved

US Authors Prepare Class-Action Lawsuit Against Anthropic Over AI Book Piracy

A California federal judge has allowed a class-action lawsuit against Anthropic, alleging the AI company downloaded up to seven million copyrighted books from pirated sources to train its chatbot Claude, violating the Copyright Act. The lawsuit, filed by authors including Andrea Bartz, Charles Graeber, and Kirk Wallace Johnson, joins a broader trend of legal actions against AI firms over copyright issues, with some cases focusing on unauthorized data use and others on licensing disputes.

via The Verge|

#ai-training-data #anthropic #class-action-lawsuit

entertainment7 months ago•13 min saved

The Future of Music and AI: Opportunities and Challenges

The article explores the potential for AI to disrupt the music industry, highlighting legal battles over copyright infringement, the impact on musicians' careers, and the importance of licensing AI training data to protect creators' rights.

via The Verge|

#ai-music #ai-training-data #copyright-law

technology8 months ago•2 min saved

Reddit Sues Anthropic Over Data Use and Bot Access

Reddit has sued Anthropic in San Francisco, alleging that Anthropic accessed Reddit over 100,000 times since July 2024 despite claims of blocking its bots, and accuses the company of exploiting Reddit's content for AI training, amidst a broader trend of legal actions against AI firms for copyright violations.

via The Verge|

#ai-lawsuits #ai-training-data #anthropic

technology8 months ago•2 min saved

Reddit Sues Anthropic Over Data Use and Bot Access

Reddit has sued AI startup Anthropic for unauthorized use of its data to train AI models, claiming violations of user agreements and data scraping without permission, marking a significant legal challenge in AI data practices.

via TechCrunch|

#ai-training-data #anthropic #data-scraping

technology2 years ago•2 min saved

"Report: Tumblr and WordPress Strike Deals to Sell User Data for AI Training"

Automattic, the owner of Tumblr and WordPress.com, is reportedly in talks with AI companies Midjourney and OpenAI to provide training data scraped from users' posts, potentially creating a new revenue stream for the site. The company plans to launch a new setting allowing users to opt out of data sharing with third parties, including AI companies. However, it's unclear what data has been sent to the AI companies and how it has been used. This move reflects a trend of companies striking deals with AI tool makers for training data, but it has also sparked concerns from the creative community about their work being used for training. Automattic has struggled to monetize Tumblr since acquiring it in 2019 and is seeking new avenues for revenue.

via The Verge|

#ai-training-data #automattic #midjourney

technology2 years ago•3 min saved

OpenAI Accuses New York Times of Hacking ChatGPT for Lawsuit Evidence

OpenAI alleges that The New York Times "hacked" its ChatGPT to generate copyright infringement examples for a lawsuit, claiming the media company used deceptive prompts violating OpenAI's terms of use. The filing comes amid a broader battle over using copyrighted material for AI training data, with OpenAI arguing it's "impossible" to train top AI models without copyrighted works. The company has been courting publishers for content partnerships and highlighted its opt-out process for publishers while actively engaging with them to gain access to materials for AI training.

via CNBC|

#ai-training-data #copyright-infringement #lawsuit

technology2 years ago•1 min saved

"Reddit and Google Strike $60M AI Training Data Deal"

Google has partnered with Reddit to access the platform's data API for AI training, allowing the tech giant to efficiently train models and display Reddit content across its products. The collaboration also grants Reddit access to Google's Vertex AI for improving search results. Despite previous tensions, the deal signifies a potential revenue stream for Reddit, which is preparing for an IPO and seeking to boost its valuation.

via The Verge|

#ai-training-data #data-api #google

artificial-intelligence2 years ago•2 min saved

OpenAI Collaborates with Organizations to Enhance AI Training Data

OpenAI has announced its Data Partnerships program, aiming to collaborate with third-party organizations to develop new data sets for training AI models. The initiative seeks to address the flaws and biases present in existing data sets, which can lead to harmful amplification by AI models. OpenAI plans to collect large-scale data sets that reflect human society and encompass various modalities, including images, audio, and video. The company is particularly interested in data that expresses human intention across different languages, topics, and formats. OpenAI will work with organizations to digitize training data and create both open source and private data sets. While the program aims to improve AI model understanding, concerns have been raised about potential bias and compensation for data owners.

via TechCrunch|

#ai-training-data #artificial-intelligence #bias

technology2 years ago•1 min saved

"Google Empowers Publishers with Opt-Out Switch for AI Training Data"

Google has introduced a new tool called Google-Extended, which allows website publishers to opt out of having their data used to train the company's AI models while still being accessible through Google Search. The tool, available through robots.txt, enables publishers to control access to their content and manage whether their sites contribute to improving AI applications. This move comes as some sites have already blocked OpenAI's web crawler to prevent data scraping for AI training, and concerns have arisen about blocking Google's crawlers without affecting search indexing.

via The Verge|

#ai-training-data #data-privacy #google