Tag

Content Scraping

All articles tagged with #content scraping

Perplexity AI Faces Accusations of Stealth Data Scraping and Evasion
technology6 months ago

Perplexity AI Faces Accusations of Stealth Data Scraping and Evasion

Perplexity AI has been accused of covertly scraping website content by disguising its bots and ignoring no-crawl directives, raising concerns about ethical data collection and the impact on web publishers. Despite attempts to hide their activities, Perplexity's bots continue to bypass restrictions, contributing to a surge in AI data scraping that threatens the sustainability of web content monetization. The issue highlights ongoing tensions between AI companies and website owners over data access and compensation.

AI Companies Accused of Ignoring Web Standards and Copyright Laws
technology1 year ago

AI Companies Accused of Ignoring Web Standards and Copyright Laws

Several AI companies are reportedly ignoring the Robots Exclusion Protocol (robots.txt) to scrape content from websites without permission, leading to disputes with publishers. TollBit, a content licensing startup, has highlighted widespread non-compliance, with AI firms using data for training without authorization. This has resulted in legal actions and negotiations for licensing deals, as the debate over the legality and value of using content to train generative AI continues.

Redditors Successfully Troll AI News Mill with Fake WoW Feature
technology2 years ago

Redditors Successfully Troll AI News Mill with Fake WoW Feature

Redditors pranked an AI-powered news mill by posting a fake announcement about the introduction of "Glorbo" to World of Warcraft. The news mill, called The Portal, mindlessly regurgitated the post and published an article about Glorbo, likely written by a bot. This incident exposed the automated content scraping of Reddit by The Portal and prompted users to try to game the bots. The prank gained attention on social media, leading The Portal to take down the Glorbo post and remove all World of Warcraft content from its site. The content scraping is likely done to boost the search rankings of The Portal and increase traffic to the site.