Ethically Sourced Data News

technology8 months ago•3 min saved

Researchers Challenge Industry Claims on Ethical AI and Copyright

A team of researchers from MIT, Cornell, and other institutions successfully trained a large language model using only ethically-sourced, publicly licensed data, challenging the industry belief that such development is impossible without vast resources. They created the Common Pile dataset, manually curated over eight terabytes of data, and trained a seven billion-parameter AI that rivals older industry models, highlighting ethical concerns around data use and copyright in AI development.

via futurism.com|