Tag

Open Domain Text

All articles tagged with #open domain text

technology7 months ago

EleutherAI Releases Large Open-Source Dataset to Promote Fair and Legal AI Training

EleutherAI has released The Common Pile v0.1, a large 8TB dataset of licensed and open-domain text, to train AI models, aiming to increase transparency and reduce reliance on copyrighted material. The dataset was used to develop models that perform comparably to proprietary ones, challenging the notion that unlicensed data is necessary for high performance. The release is part of a broader effort to promote open data and transparency in AI research amid ongoing legal debates.