
Harvard Unveils Free AI Training Dataset Backed by OpenAI and Microsoft
Harvard University is releasing a massive dataset of nearly 1 million public-domain books to aid AI training, funded by OpenAI and Microsoft. This initiative aims to democratize access to high-quality data for AI development, offering an alternative to copyrighted materials. The dataset, part of Harvard's Institutional Data Initiative, includes diverse works from Shakespeare to Czech math textbooks. This move aligns with ongoing efforts to create accessible AI training resources amid legal challenges over the use of copyrighted data.













