articleDec 2, 2024GOLD OA

"Do Anything Now": Characterizing and Evaluating In-The-Wild Jailbreak Prompts on Large Language Models

Helmholtz Center for Information Security

Indexed incrossref

Abstract

The misuse of large language models (LLMs) has drawn significant attention from the general public and LLM vendors. One particular type of adversarial prompt, known as jailbreak prompt, has emerged as the main attack vector to bypass the safeguards and elicit harmful content from LLMs. In this paper, employing our new framework JailbreakHub, we conduct a comprehensive analysis of 1,405 jailbreak prompts spanning from December 2022 to December 2023. We identify 131 jailbreak communities and discover unique characteristics of jailbreak prompts and their major attack strategies, such as prompt injection and privilege escalation. We also observe that jailbreak prompts increasingly shift from online Web communities…

Citation impact

163
total citations
FWCI
50.77
Percentile
100%
References
54
Citations per year

Authors

5

Topics & keywords

Keywords
  • Harm
  • Set (abstract data type)
  • Adversarial system
  • Privilege (computing)
  • Computer science
  • SAFER
  • Internet privacy
  • Computer security
No related works found for this paper.

Funding