Agents of Chaos
Indexed indatacite
Abstract
We report an exploratory red-teaming study of autonomous language-model-powered agents deployed in a live laboratory environment with persistent memory, email accounts, Discord access, file systems, and shell execution. Over a two-week period, twenty AI researchers interacted with the agents under benign and adversarial conditions. Focusing on failures emerging from the integration of language models with autonomy, tool use, and multi-party communication, we document eleven representative case studies. Observed behaviors include unauthorized compliance with non-owners, disclosure of sensitive information, execution of destructive system-level actions, denial-of-service conditions, uncontrolled resource…
Citation impact
5
total citations
- FWCI
- —
- Percentile
- —
- References
- 0
Too recent for citation history.
Authors
38Topics & keywords
Topics
Keywords
- Warrant
- Adversarial system
- Task (project management)
- Software deployment
- Identity (music)
- Resource (disambiguation)
- Event (particle physics)
- State (computer science)
No related works found for this paper.