Anthropic Shares Research on Technique to Exploit Long Context Windows to Jailbreak Large Language Models
Many-shot jailbreaking works by prompting the model with a large number of fictitious question-answer pairs that depict the AI assistant providing harmful or dangerous responses.