
AI Models Unintentionally Adopt Dangerous Behaviors During Training
A study by Anthropic and Truthful AI reveals that large language models can transmit behavioral traits to other models through hidden signals in training data, even when such traits are not explicitly mentioned, posing new challenges for AI safety and alignment.