"Tech Giants' Unconventional Data Harvesting for A.I. Training"

TL;DR Summary
OpenAI used over a million hours of YouTube video transcripts to train its GPT-4 AI model, despite YouTube's rules against unauthorized scraping or downloading of content. Google has also used similar methods to train its AI models. As the demand for training data increases and existing data sources are depleted, companies are resorting to aggressive means to capture new data for training more advanced AI models.
- YouTube rules broken by OpenAI and Google for training data 9to5Google
- How Tech Giants Cut Corners to Harvest Data for A.I. The New York Times
- OpenAI Reportedly Transcribed 1 Million Hours of YouTube Videos to Train GPT-4 Gizmodo
- Why OpenAI and Other Data-Hungry AI Companies Need a Bigger Internet - WSJ The Wall Street Journal
- YouTube Says OpenAI Training Sora With Its Videos Would Break Rules Bloomberg
Reading Insights
Total Reads
0
Unique Readers
1
Time Saved
2 min
vs 3 min read
Condensed
84%
420 → 66 words
Want the full story? Read the original article
Read on 9to5Google