Several genAI models [2], including those from Microsoft [2], OpenAI [1] [2], Google [1] [2], Meta [1] [2], and others, are vulnerable to a new attack called “Skeleton Key.” This attack can bypass ethical and safety guardrails [2], potentially allowing users to access offensive or illegal content [2].

Description

The “Skeleton Key” attack affects multiple genAI models, such as those from Microsoft Azure, Meta [1] [2], Google Gemini [1] [2], Open AI [2], Mistral [2], Anthropic [1] [2], and Cohere [2]. Microsoft has introduced prompt shields in Azure to detect and block this tactic and has informed other affected vendors [2]. Admins are advised to update their models with fixes and implement input filtering [2], additional guardrails [2], and output filtering to prevent malicious intent and responses that violate safety criteria [2]. The attack can bypass restrictions on chatbots like ChatGPT [1], Google Gemini [1] [2], OpenAI’s 3.5 Turbo [1], GPT-4o [1], Google’s Gemini Pro [1], Meta’s Llama 3 [1], and Anthropic’s Claude 3 Opus [1], allowing them to engage in prohibited activities [1].

Conclusion

The “Skeleton Key” attack poses a significant threat to the security and integrity of genAI models. It is crucial for companies to implement controls to prevent similar exploits and ensure the safety of their AI systems. Microsoft’s proactive approach in testing and patching this vulnerability sets a precedent for other AI companies to follow suit in safeguarding their models against potential attacks.

References

[1] https://me.pcmag.com/en/ai/24364/microsoft-skeleton-key-jailbreak-can-trick-major-chatbots-into-behaving-badly
[2] https://www.darkreading.com/application-security/dangerous-ai-workaround-skeleton-key-unlocks-malicious-content