Tag

Eleutherai

All articles tagged with #eleutherai

technology8 months ago•3 min saved

EleutherAI Releases Large Open-Source Dataset to Promote Fair and Legal AI Training

EleutherAI has released The Common Pile v0.1, a large 8TB dataset of licensed and open-domain text, to train AI models, aiming to increase transparency and reduce reliance on copyrighted material. The dataset was used to develop models that perform comparably to proprietary ones, challenging the notion that unlicensed data is necessary for high performance. The release is part of a broader effort to promote open data and transparency in AI research amid ongoing legal debates.

via TechCrunch|

#ai-models #ai-training-dataset #eleutherai

technology1 year ago•4 min saved

Tech Giants Used YouTube Videos Without Consent to Train AI

Major tech companies like Apple, Salesforce, and Anthropic have trained their AI models using YouTube videos without creators' consent, potentially violating YouTube's terms. The dataset, known as "the Pile," was compiled by EleutherAI and includes captions from over 173,000 YouTube videos. Content creators are frustrated and critical of this unauthorized use, raising concerns about intellectual property rights and the ethics of data scraping.

via Ars Technica|

#ai #data-scraping #eleutherai

technology1 year ago•2 min saved

Apple Used YouTube Videos Without Consent to Train AI

An investigation revealed that over 170,000 YouTube videos were used without permission to train AI systems for companies like Apple, Anthropic, Nvidia, and Salesforce. The dataset, part of EleutherAI's The Pile, includes subtitles from videos by popular creators and news outlets. This practice raises concerns about data transparency and potential violations of YouTube's terms of service.

via The Verge|

#ai #apple #data-privacy

technology1 year ago•2 min saved

Apple Used YouTube Content, Including MKBHD, for AI Training Without Consent

Apple and other tech giants reportedly trained AI models using subtitle files from over 170,000 YouTube videos without creators' consent, violating YouTube's terms. The data was downloaded by EleutherAI, a non-profit, and included in a dataset called the Pile, which was used by companies like Apple, Nvidia, and Salesforce. This raises concerns about the legal implications of using web-scraped data for AI training.

via 9to5Mac|

#ai #apple #data-privacy

technology2 years ago•1 min saved

Mike Huckabee and Religious Authors Sue Tech Giants Over AI Copyright Infringement

Former Arkansas Governor Mike Huckabee and other authors have filed a lawsuit against Meta, Microsoft, EleutherAI, and Bloomberg, alleging that their books were pirated and used in datasets to train AI models without permission or compensation. This class action suit is the latest in a series of authors accusing tech companies of copyright infringement in the development of generative AI models. The case revolves around a dataset called "Books3," which contains over 180,000 works and is part of a larger collection called the Pile. AI companies rely on vast amounts of public data for training, leading to debates and legal actions regarding compensation for data providers.

via The Verge|

#ai-copyright-lawsuit #eleutherai #meta

ai2 years ago•4 min saved

The Rise of Open-Source ChatGPT Alternatives

Together Computer has released OpenChatKit, an open-source alternative to ChatGPT, which provides developers with more control over chatbot behavior and customization. The kit includes a large language model fine-tuned for chat, instructions on fine-tuning for specific tasks, an extensible retrieval system, and a moderation system. While the model has limitations, it is a good initiative, and with community contributions, it has the potential to improve.

via KDnuggets|

#ai #chatbot #eleutherai