Tag

The Common Pile

All articles tagged with #the common pile

EleutherAI Releases Large Open-Source Dataset to Promote Fair and Legal AI Training

Originally Published 7 months ago — by TechCrunch

Featured image for EleutherAI Releases Large Open-Source Dataset to Promote Fair and Legal AI Training
Source: TechCrunch

EleutherAI has released The Common Pile v0.1, a large 8TB dataset of licensed and open-domain text, to train AI models, aiming to increase transparency and reduce reliance on copyrighted material. The dataset was used to develop models that perform comparably to proprietary ones, challenging the notion that unlicensed data is necessary for high performance. The release is part of a broader effort to promote open data and transparency in AI research amid ongoing legal debates.