Tag

Ethically Sourced Data

All articles tagged with #ethically sourced data

Researchers Challenge Industry Claims on Ethical AI and Copyright
technology8 months ago

Researchers Challenge Industry Claims on Ethical AI and Copyright

A team of researchers from MIT, Cornell, and other institutions successfully trained a large language model using only ethically-sourced, publicly licensed data, challenging the industry belief that such development is impossible without vast resources. They created the Common Pile dataset, manually curated over eight terabytes of data, and trained a seven billion-parameter AI that rivals older industry models, highlighting ethical concerns around data use and copyright in AI development.