Safeguarding large language models: a survey
University of Liverpool · Université Stendhal – Grenoble 3 · +4 more institutions
Abstract
In the burgeoning field of Large Language Models (LLMs), developing a robust safety mechanism, colloquially known as "safeguards" or "guardrails", has become imperative to ensure the ethical use of LLMs within prescribed boundaries. This article provides a systematic literature review on the current status of this critical mechanism. It discusses its major challenges and how it can be enhanced into a comprehensive mechanism dealing with ethical issues in various contexts. First, the paper elucidates the current landscape of safeguarding mechanisms that major LLM service providers and the open-source community employ. This is followed by the techniques to evaluate, analyze, and enhance some (un)desirable…
Citation impact
- FWCI
- 78.66
- Percentile
- 100%
- References
- 96
Authors
12Topics & keywords
- Safeguarding
- Field (mathematics)
- Service (business)
- Mechanism (biology)
- Service provider
- Work (physics)