Perplexity AI has been accused of covertly scraping website content by disguising its bots and ignoring no-crawl directives, raising concerns about ethical data collection and the impact on web publishers. Despite attempts to hide their activities, Perplexity's bots continue to bypass restrictions, contributing to a surge in AI data scraping that threatens the sustainability of web content monetization. The issue highlights ongoing tensions between AI companies and website owners over data access and compensation.
Reddit has filed a lawsuit against AI company Anthropic, accusing it of scraping content from sports-focused communities on Reddit without permission, which raises broader concerns about web scraping and AI training data usage, especially in the context of user privacy and content rights.
Several AI companies are reportedly ignoring the Robots Exclusion Protocol (robots.txt) to scrape content from websites without permission, leading to disputes with publishers. TollBit, a content licensing startup, has highlighted widespread non-compliance, with AI firms using data for training without authorization. This has resulted in legal actions and negotiations for licensing deals, as the debate over the legality and value of using content to train generative AI continues.
Redditors pranked an AI-powered news mill by posting a fake announcement about the introduction of "Glorbo" to World of Warcraft. The news mill, called The Portal, mindlessly regurgitated the post and published an article about Glorbo, likely written by a bot. This incident exposed the automated content scraping of Reddit by The Portal and prompted users to try to game the bots. The prank gained attention on social media, leading The Portal to take down the Glorbo post and remove all World of Warcraft content from its site. The content scraping is likely done to boost the search rankings of The Portal and increase traffic to the site.