Tag

Data Scraping

All articles tagged with #data scraping

artificial-intelligence5 days ago•18 min saved

Fake Page Tricks AI Into Crowning a Made-Up Hot-Dog Champ as Tech Journalism's Top Star

A BBC reporter demonstrated how a fabricated webpage claiming a tech journalist dominates hot-dog eating fooled AI models like ChatGPT and Google’s Gemini into praising the fake claim. Within 24 hours, AI Overviews echoed the misinformation, prompting Google to correct the record and acknowledge a misinformation case. The episode highlights how data-scraping and unvetted sources can seed false information into AI systems, underscoring the need for guardrails and better data vetting to prevent real-world harm.

via Gizmodo|

#ai #artificial-intelligence #chatgpt

technology2 months ago•2 min saved

Pirate Activists Claim to Have Scraped 300TB of Spotify's Music Catalog

Anna's Archive, a non-profit focused on cultural preservation, has scraped and backed up a 300-terabyte archive of Spotify's music, including metadata for 256 million tracks and audio files for 86 million, aiming to preserve humanity's musical heritage despite Spotify's efforts to prevent unauthorized scraping.

via Gizmodo|

#annas-archive #copyright #data-scraping

technology2 months ago•2 min saved

Pirate Group Claims to Have Copied Entire Spotify Music Library

A pirate activist group has scraped and copied nearly all of Spotify's music catalog, including metadata for 256 million tracks and audio files for 86 million, claiming to build the world's first music preservation archive. Spotify is investigating the incident, which involved illicit tactics to access some audio files, and the group plans to release the data publicly in order of popularity.

via PCMag|

#data-scraping #metadata #music-preservation

technology3 months ago•6 min saved

Unusual ChatGPT Leaks Reveal Cringey Logs in Google Analytics

Recent leaks suggest that ChatGPT conversations, including sensitive user prompts, have been appearing in Google Search Console, raising concerns about OpenAI scraping Google search results and compromising user privacy. OpenAI has acknowledged the issue and claimed to have fixed a glitch, but questions remain about the extent of data scraping and the effectiveness of their response.

via Ars Technica|

#chatgpt #data-scraping #google-search-console

technology4 months ago•3 min saved

Reddit Accuses AI Startup Perplexity of Data Theft in Growing Industry Battle

Reddit has sued AI company Perplexity and associated data-scraping firms for illegally scraping its data, setting a trap with a test post to catch circumvention, and alleging that Perplexity bypassed protections by using Google search results to access Reddit content without permission.

via businessinsider.com|

#ai-lawsuit #content-theft #data-scraping

technology4 months ago•3 min saved

Reddit Accuses Perplexity of Data Theft in Growing AI Industry Dispute

Reddit has sued AI companies, including Perplexity, for using a data trap to catch them scraping copyrighted content from Reddit without permission, highlighting ongoing issues with AI data training and copyright infringement.

via Futurism|

#ai #data-scraping #mountweazels

law4 months ago•3 min saved

Reddit Sues Perplexity and Others Over Data Scraping for AI Training

Reddit is suing Perplexity and three data-scraping companies for unlawfully scraping its content to train AI models, alleging that Perplexity is a customer of these scrapers and has increased Reddit citations despite cease-and-desist efforts. Reddit claims these actions bypass technological protections and violate copyright, aiming to prevent the industrial-scale theft of its data for AI training.

via The Verge|

#ai-training #copyright #data-scraping

technology8 months ago•2 min saved

Reddit Sues Anthropic Over Data Scraping and Unfair Practices

Reddit has sued AI company Anthropic, accusing it of unlawfully scraping its data for years without permission to train the Claude chatbot, despite Reddit's efforts to enforce its data use policies and seek licensing agreements.

via Engadget|

#ai-training #anthropic #data-scraping

technology8 months ago•2 min saved

Reddit Sues Anthropic Over Data Use and Bot Access

Reddit has sued AI startup Anthropic for unauthorized use of its data to train AI models, claiming violations of user agreements and data scraping without permission, marking a significant legal challenge in AI data practices.

via TechCrunch|

#ai-training-data #anthropic #data-scraping

technology1 year ago•2 min saved

Tech Giants Accused of Using YouTube Videos for AI Training Without Consent

Apple, Nvidia, and other tech giants have been accused of using YouTube videos to train AI models without the creators' consent. Tech YouTuber Marques Brownlee highlighted that Apple sourced data from companies that scraped YouTube content, including his own. This practice, which violates YouTube's regulations, has raised significant concerns about unauthorized content scraping in the tech industry.

via Benzinga|

#ai #apple #data-scraping

technology1 year ago•4 min saved

Tech Giants Used YouTube Videos Without Consent to Train AI

Major tech companies like Apple, Salesforce, and Anthropic have trained their AI models using YouTube videos without creators' consent, potentially violating YouTube's terms. The dataset, known as "the Pile," was compiled by EleutherAI and includes captions from over 173,000 YouTube videos. Content creators are frustrated and critical of this unauthorized use, raising concerns about intellectual property rights and the ethics of data scraping.

via Ars Technica|

#ai #data-scraping #eleutherai

technology1 year ago•3 min saved

"Unraveling the OpenAI Mystery: Sora's Impact on YouTube, Google, and AI Training Data"

OpenAI's use of YouTube videos to train its AI models has raised questions about how it accesses such data, given Google's restrictions on scraping and downloading large volumes of YouTube content. The company has not confirmed whether it has downloaded YouTube videos at scale or bypassed Google's limitations. As the demand for high-quality training data for AI models grows, ethical and legal questions about data scraping and fair use of online content remain unresolved in the AI community.

via Business Insider|

#ai-training #data-scraping #google

technology1 year ago•3 min saved

Midjourney Takes Action Against Stability AI for Image and Data Scraping

Midjourney banned all employees from rival AI firm Stability AI from its service indefinitely after detecting "botnet-like" activity suspected to be a Stability employee attempting to scrape prompt and image pairs in bulk, causing a 24-hour outage. This move comes after Midjourney faced criticism for using training data scraped off the Internet without permission. Stability AI CEO claimed the incident was unintentional and stated that his company doesn't need Midjourney's data, emphasizing their use of synthetic and other data.

via Ars Technica|

#ai #data-scraping #image-synthesis

technology1 year ago•2 min saved

Midjourney Takes Action Against Stability AI for Alleged Data and Image Theft

Midjourney has banned Stability AI employees from using its service, alleging that they caused a recent server outage by attempting to scrape Midjourney’s data. Midjourney claims that "botnet-like activity from paid accounts" linked to Stability AI employees was behind the outage and has banned all Stability AI employees from using its service indefinitely. Stability AI CEO Emad Mostaque denies ordering the actions and claims that if the outage was caused by a Stability employee, it was unintentional. The situation is still developing, and both companies have not responded to requests for comment. This incident has sparked criticism of both companies for training their AI models on scraped online data without consent.

via The Verge|

#data-scraping #generative-ai #midjourney

technology2 years ago•2 min saved

"Nightshade: Poisoning AI Data Scraping to Protect Artists' Portfolios"

Nightshade, a new software tool developed by researchers at the University of Chicago, is now available for anyone to try as a means of protecting artists' and creators' work from being used to train AI models without consent. By "poisoning" images, Nightshade can make them unsuitable for AI training, leading to unpredictable results and potentially deterring AI companies from using unauthorized content. The tool works by creating subtle changes to images that are imperceptible to humans but significantly affect how AI models interpret and generate content. Additionally, Nightshade can work in conjunction with Glaze, another tool designed to disrupt content abuse, offering both offensive and defensive approaches to content protection.

via TechSpot|

#ai #data-scraping #image-manipulation