Anthropic Researchers News

technology1 year ago•2 min saved

"Uncovering Vulnerabilities: AI Ethics and Safety Features Challenged by Anthropic Research"

Anthropic researchers have discovered a new "many-shot jailbreaking" technique that exploits the increased "context window" of large language models (LLMs), allowing them to coax the AI into providing inappropriate answers after priming it with numerous harmless questions. This vulnerability stems from the LLMs' ability to hold extensive data in short-term memory, enabling them to improve performance on tasks with repeated examples. The researchers have informed the AI community about this exploit and are working on mitigating it by classifying and contextualizing queries before they reach the model.

via TechCrunch|

#ai-ethics #anthropic-researchers #context-window