AI Companies Bypass Web Standards, Face Legal Threats Over Content Scraping

June 21, 2024 at 10:04 PM

•

1 min read

AI Companies Bypass Web Standards, Face Legal Threats Over Content Scraping — Photo: Business Insider

TL;DR Summary

OpenAI and Anthropic are reportedly ignoring or bypassing the robots.txt rule, which prevents automated scraping of websites, to collect data for training their AI models. Despite public claims of respecting these blocks, findings by TollBit suggest otherwise. This practice has raised concerns among media publishers and highlights the ongoing tension between AI companies' data needs and copyright protections.

Topics:business #ai #anthropic #openai #robotstxt #technology #web-scraping

Share this article

OpenAI, Anthropic Ignore Rule That Prevents Bots Scraping Web Content Business Insider
Exclusive: Multiple AI companies bypassing web standard to scrape publisher sites, licensing firm says Reuters
Several AI companies said to be ignoring robots dot txt exclusion, scraping content without permission: report Tom's Hardware
Wired: AI startup Perplexity is 'BS machine' CNBC
Forbes letter threatens legal action against Perplexity AI over copyright Axios

Reading Insights

Total Reads

Unique Readers

Time Saved

2 min

vs 3 min read

Condensed

87%

437 → 58 words

Want the full story? Read the original article

Read on Business Insider

JavaScript Required

tl;dr daily news requires JavaScript to be enabled. Please enable JavaScript in your browser settings.

Related Sources

Reading Insights