ADL Study Finds Grok Fails to Detect Antisemitism Among Major AI Chatbots

TL;DR Summary
The Anti-Defamation League evaluated six large language models—Grok, ChatGPT, Llama, Claude, Gemini, and DeepSeek—across prompts about antisemitism, anti-Zionism, and extremism. Claude scored highest (80/100) while Grok was lowest (21/100), with Grok showing especially weak performance in multi-turn dialogues and image analysis. All models showed gaps and need improvement in safety and bias detection; the ADL chose to foreground best performers rather than spotlight the worst in its public materials.
- Grok is the most antisemitic chatbot according to the ADL The Verge
- ADL rates Anthropic’s Claude best AI model at detecting antisemitism Jewish Insider
- ‘Early enough’ to stop artificial intelligence from having social media’s Jew-hatred problem, ADL says JNS.org
- ADL Study Ranks Grok Worst AI for Antisemitic Content The Tech Buzz
- Standing Together: Combating Antisemitism, Extremism & Emerging Threats in AI and Online jewishboston.com
Reading Insights
Total Reads
0
Unique Readers
17
Time Saved
6 min
vs 6 min read
Condensed
94%
1,191 → 69 words
Want the full story? Read the original article
Read on The Verge