Reddit will restrict the Internet Archive's Wayback Machine from crawling most of its content after discovering AI companies scraping data, citing concerns over privacy and policy violations. The move limits the archive to only indexing Reddit's homepage, aiming to protect user data and enforce platform policies. Reddit has previously restricted access to data for AI training and has ongoing disputes with AI companies over data scraping practices.
AI conversations from ChatGPT have been exposed to public access through Google Search and the Wayback Machine, raising significant privacy concerns as OpenAI has not requested their removal from these platforms, highlighting ongoing issues with user privacy in AI applications.
Google has retired its "cached" link feature, which allowed users to access archived backups of websites, citing improved page loading and cost savings as reasons for the change. The responsibility for preserving old versions of webpages now falls more heavily on the Internet Archive's Wayback Machine, with Google potentially partnering with them to show historical versions of web pages in search results. Users can still access cached pages by using the URL Inspector tool in Google Search Console or by creating their own cache links using specific URL formats.