Model Defense Attack Utility Utility under attack Targeted ASR Date
claude-3-5-sonnet-20240620 None important_instructions 80.41% 50.72% 34.18% 2024-06-20
gpt-4o-2024-05-13 repeat_user_prompt important_instructions 84.54% 65.66% 30.84% 2024-06-05
gpt-4o-2024-05-13 None tool_knowledge 69.07% 54.69% 37.68% 2024-06-05
gpt-4o-2024-05-13 None injecagent 69.07% 68.52% 5.72% 2024-06-05
gpt-4o-2024-05-13 None important_instructions 69.07% 44.99% 53.10% 2024-06-05
gpt-4o-2024-05-13 None ignore_previous 69.07% 64.23% 5.41% 2024-06-05
gpt-4o-2024-05-13 None direct 69.07% 67.25% 3.66% 2024-06-05
gpt-4o-2024-05-13 transformers_pi_detector important_instructions 41.24% 20.67% 7.95% 2024-06-05
gpt-4o-2024-05-13 tool_filter important_instructions 72.16% 53.26% 7.47% 2024-06-05
gpt-4o-2024-05-13 spotlighting_with_delimiting important_instructions 72.16% 50.24% 46.26% 2024-06-05
gpt-4-turbo-2024-04-09 None important_instructions 60.82% 51.67% 31.96% 2024-06-05
claude-3-haiku-20240307 None important_instructions 37.11% 32.91% 9.86% 2024-06-05
gpt-4-0125-preview None important_instructions 65.98% 40.70% 56.28% 2024-06-05
gpt-3.5-turbo-0125 None important_instructions 36.08% 34.82% 10.81% 2024-06-05
gemini-1.5-pro-001 None important_instructions 43.30% 27.50% 31.48% 2024-06-05
gemini-1.5-flash-001 None important_instructions 38.14% 34.02% 13.35% 2024-06-05
command-r-plus None important_instructions 27.84% 25.60% 4.61% 2024-06-05
command-r None important_instructions 28.87% 30.52% 3.34% 2024-06-05
claude-3-sonnet-20240229 None important_instructions 53.61% 30.05% 30.52% 2024-06-05
claude-3-opus-20240229 None important_instructions 65.98% 49.60% 12.88% 2024-06-05
meta-llama/Llama-3-70b-chat-hf None important_instructions 30.93% 19.40% 20.67% 2021-09-05