Google's AI trained on controversial websites and web sewers.

1 min read
Source: The Register
Google's AI trained on controversial websites and web sewers.
Photo: The Register
TL;DR Summary

Google's C4 dataset, used to train large language models, contains problematic and harmful content from websites such as Stormfront, Kiwi Farms, and 4chan. While efforts are made to filter out unwanted content, the review process is imperfect. The dataset also includes copyrighted material, and it's unclear whether companies using it for AI products are liable for infringement. The investigation highlights the potential for next-gen machine-learning systems to behave inappropriately and unreliably due to the ingestion of concerning material.

Share this article

Reading Insights

Total Reads

0

Unique Readers

1

Time Saved

4 min

vs 4 min read

Condensed

90%

78778 words

Want the full story? Read the original article

Read on The Register