Researchers Challenge Industry Claims on Ethical AI and Copyright

June 7, 2025 at 03:15 PM

•

1 min read

Researchers Challenge Industry Claims on Ethical AI and Copyright — Photo: futurism.com

TL;DR Summary

A team of researchers from MIT, Cornell, and other institutions successfully trained a large language model using only ethically-sourced, publicly licensed data, challenging the industry belief that such development is impossible without vast resources. They created the Common Pile dataset, manually curated over eight terabytes of data, and trained a seven billion-parameter AI that rivals older industry models, highlighting ethical concerns around data use and copyright in AI development.

Topics:business #ai #copyright #ethically-sourced-data #large-language-model #mit #technology

Share this article

The Tech Industry Said It Was "Impossible" to Create AI Based Entirely on Ethically-Sourced Data, So These Scientists Proved Them Wrong in Spectacular Fashion futurism.com
Analysis | AI firms say they can’t respect copyright. These researchers tried. The Washington Post
Practical commentary regarding copyright and generative AI training | Canada | Global law firm Norton Rose Fulbright
Foley Attorneys Explore IP Considerations for AI-Generated Logos Foley & Lardner LLP
Voice, Likeness, and Fair Use in the Age of AI Disruptive Competition Project

Reading Insights

Total Reads

Unique Readers

Time Saved

3 min

vs 4 min read

Condensed

89%

642 → 69 words

Want the full story? Read the original article

Read on futurism.com

JavaScript Required

tl;dr daily news requires JavaScript to be enabled. Please enable JavaScript in your browser settings.

Related Sources

Reading Insights