Palo Alto Networks Unit 42 has developed Deceptive Delight, a method that effectively circumvents safety protocols in large language models, achieving a 64.6% success rate in generating harmful content during interactive conversations.
View full story…