Tag

Content Scraping

All articles tagged with #content scraping

technology6 months ago•4 min saved

Perplexity AI Faces Accusations of Stealth Data Scraping and Evasion

Perplexity AI has been accused of covertly scraping website content by disguising its bots and ignoring no-crawl directives, raising concerns about ethical data collection and the impact on web publishers. Despite attempts to hide their activities, Perplexity's bots continue to bypass restrictions, contributing to a surge in AI data scraping that threatens the sustainability of web content monetization. The issue highlights ongoing tensions between AI companies and website owners over data access and compensation.

via theregister.com|

#ai #content-scraping #perplexity

technology8 months ago•2 min saved

Reddit Sues Anthropic Over AI Data Scraping and Unfair Practices

Reddit has filed a lawsuit against AI company Anthropic, accusing it of scraping content from sports-focused communities on Reddit without permission, which raises broader concerns about web scraping and AI training data usage, especially in the context of user privacy and content rights.

via Awful Announcing|

#ai-lawsuit #anthropic #content-scraping

technology1 year ago•2 min saved

AI Companies Accused of Ignoring Web Standards and Copyright Laws

Several AI companies are reportedly ignoring the Robots Exclusion Protocol (robots.txt) to scrape content from websites without permission, leading to disputes with publishers. TollBit, a content licensing startup, has highlighted widespread non-compliance, with AI firms using data for training without authorization. This has resulted in legal actions and negotiations for licensing deals, as the debate over the legality and value of using content to train generative AI continues.

via Tom's Hardware|

#ai #content-scraping #copyright-infringement

technology2 years ago•3 min saved

Redditors Successfully Troll AI News Mill with Fake WoW Feature

Redditors pranked an AI-powered news mill by posting a fake announcement about the introduction of "Glorbo" to World of Warcraft. The news mill, called The Portal, mindlessly regurgitated the post and published an article about Glorbo, likely written by a bot. This incident exposed the automated content scraping of Reddit by The Portal and prompted users to try to game the bots. The prank gained attention on social media, leading The Portal to take down the Glorbo post and remove all World of Warcraft content from its site. The content scraping is likely done to boost the search rankings of The Portal and increase traffic to the site.

via Ars Technica|

#ai #content-scraping #gaming