Researchers Challenge Industry Claims on Ethical AI and Copyright

TL;DR Summary
A team of researchers from MIT, Cornell, and other institutions successfully trained a large language model using only ethically-sourced, publicly licensed data, challenging the industry belief that such development is impossible without vast resources. They created the Common Pile dataset, manually curated over eight terabytes of data, and trained a seven billion-parameter AI that rivals older industry models, highlighting ethical concerns around data use and copyright in AI development.
- The Tech Industry Said It Was "Impossible" to Create AI Based Entirely on Ethically-Sourced Data, So These Scientists Proved Them Wrong in Spectacular Fashion futurism.com
- Analysis | AI firms say they can’t respect copyright. These researchers tried. The Washington Post
- Practical commentary regarding copyright and generative AI training | Canada | Global law firm Norton Rose Fulbright
- Foley Attorneys Explore IP Considerations for AI-Generated Logos Foley & Lardner LLP
- Voice, Likeness, and Fair Use in the Age of AI Disruptive Competition Project
Reading Insights
Total Reads
0
Unique Readers
0
Time Saved
3 min
vs 4 min read
Condensed
89%
642 → 69 words
Want the full story? Read the original article
Read on futurism.com