"Tech Giants' Unconventional Data Harvesting for A.I. Training"

April 8, 2024 at 02:39 PM

•

1 min read

"Tech Giants' Unconventional Data Harvesting for A.I. Training" — Photo: 9to5Google

TL;DR Summary

OpenAI used over a million hours of YouTube video transcripts to train its GPT-4 AI model, despite YouTube's rules against unauthorized scraping or downloading of content. Google has also used similar methods to train its AI models. As the demand for training data increases and existing data sources are depleted, companies are resorting to aggressive means to capture new data for training more advanced AI models.

Topics:business #ai-training #google #gpt-4 #openai #technology #youtube

Share this article

YouTube rules broken by OpenAI and Google for training data 9to5Google
How Tech Giants Cut Corners to Harvest Data for A.I. The New York Times
OpenAI Reportedly Transcribed 1 Million Hours of YouTube Videos to Train GPT-4 Gizmodo
Why OpenAI and Other Data-Hungry AI Companies Need a Bigger Internet - WSJ The Wall Street Journal
YouTube Says OpenAI Training Sora With Its Videos Would Break Rules Bloomberg

Reading Insights

Total Reads

Unique Readers

Time Saved

2 min

vs 3 min read

Condensed

84%

420 → 66 words

Want the full story? Read the original article

Read on 9to5Google

JavaScript Required

tl;dr daily news requires JavaScript to be enabled. Please enable JavaScript in your browser settings.

Related Sources

Reading Insights