A dataset of one million public posts from the social media platform Bluesky was scraped and uploaded to AI company Hugging Face for machine learning research, sparking controversy over user consent and data privacy. The dataset, which included user identifiers and metadata, was removed after backlash. Bluesky, which operates on an open and decentralized protocol, is exploring ways to allow users to communicate consent for data use to third-party developers.
Bluesky has stated it will not use user posts for AI training, but a recent incident where a Hugging Face employee scraped and published data from 1 million Bluesky posts highlights the challenge of enforcing this policy. Although the data was removed and the employee apologized, the incident underscores the difficulty in preventing third parties from using such data for AI purposes. Bluesky is exploring ways to allow users to specify consent for AI training, but ultimately, it will depend on external developers to honor these settings.
IBM is launching WatsonX, a development studio for companies to "train, tune and deploy" machine-learning models. The platform includes an AI-generated code feature, an AI governance toolkit, and a library of thousands of large-scale AI models. IBM is partnering with HuggingFace, and clients and collaborators so far include SAP, NASA, Wix, and PyTorch. The new AI tools are expected to be integrated most easily into areas like customer care, procurement, cybersecurity, and elements of supply chain and IT operations.