Safeguarding large language models: a survey

Dong, Yi; Mu, Ronghui; Zhang, Yanghao; Sun, Siqi; Zhang, Tianle; Wu, Changshun; Jin, Gaojie; Qi, Yi; Hu, Jinwei; Meng, Jie; Bensalem, Saddek; Huang, Xiaowei

doi:10.1007/s10462-025-11389-2

articleArtificial Intelligence ReviewOct 17, 2025HYBRID OA

Safeguarding large language models: a survey

YDYi Dong RMRonghui Mu YZYanghao Zhang SSSiqi Sun TZTianle Zhang

University of Liverpool · Université Stendhal – Grenoble 3 · +4 more institutions

PubMed

Indexed incrossrefpubmed

Abstract

In the burgeoning field of Large Language Models (LLMs), developing a robust safety mechanism, colloquially known as "safeguards" or "guardrails", has become imperative to ensure the ethical use of LLMs within prescribed boundaries. This article provides a systematic literature review on the current status of this critical mechanism. It discusses its major challenges and how it can be enhanced into a comprehensive mechanism dealing with ethical issues in various contexts. First, the paper elucidates the current landscape of safeguarding mechanisms that major LLM service providers and the open-source community employ. This is followed by the techniques to evaluate, analyze, and enhance some (un)desirable…

Citation impact

42

total citations

FWCI: 78.66
Percentile: 100%
References: 96

Citations per year

Authors

12

Topics & keywords

Topics

Keywords

Safeguarding
Field (mathematics)
Service (business)
Mechanism (biology)
Service provider
Work (physics)

No related works found for this paper.

Funding

EA
Engineering and Physical Sciences Research Council
Award: EP/T026995/1