
“Sure, I’ll Help You Build a Bomb”: New Research Reveals Alarming Gaps in AI Safety
The new class of adversarial attacks are capable of circumventing the alignment measures designed to prevent the generation of inappropriate or harmful content in multiple LLMs.