A Survey on Responsible LLMs: Inherent Risk, Malicious Use, and Mitigation Strategy

📅 2025-01-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address cross-stage risks in large language models (LLMs)—including privacy leakage, hallucination generation, value misalignment, malicious misuse, and jailbreaking attacks—this paper proposes the first unified responsible governance framework spanning four critical stages: data acquisition, alignment fine-tuning, prompt-based inference, and post-hoc auditing. Methodologically, it integrates differential privacy, retrieval-augmented generation (RAG), reinforcement learning from human feedback (RLHF) with proximal policy optimization (PPO), chain-of-thought prompting, self-consistency verification, adversarial testing, and explainable auditing. These techniques jointly enforce privacy preservation, hallucination suppression, value alignment, toxicity mitigation, and jailbreak resistance. Furthermore, the work constructs a structured knowledge graph and a comprehensive technical roadmap, transcending unidimensional risk analysis. The framework delivers systematic theoretical foundations and actionable guidelines for developing and deploying secure, trustworthy, and controllable LLMs.

Technology Category

Application Category

📝 Abstract
While large language models (LLMs) present significant potential for supporting numerous real-world applications and delivering positive social impacts, they still face significant challenges in terms of the inherent risk of privacy leakage, hallucinated outputs, and value misalignment, and can be maliciously used for generating toxic content and unethical purposes after been jailbroken. Therefore, in this survey, we present a comprehensive review of recent advancements aimed at mitigating these issues, organized across the four phases of LLM development and usage: data collecting and pre-training, fine-tuning and alignment, prompting and reasoning, and post-processing and auditing. We elaborate on the recent advances for enhancing the performance of LLMs in terms of privacy protection, hallucination reduction, value alignment, toxicity elimination, and jailbreak defenses. In contrast to previous surveys that focus on a single dimension of responsible LLMs, this survey presents a unified framework that encompasses these diverse dimensions, providing a comprehensive view of enhancing LLMs to better serve real-world applications.
Problem

Research questions and friction points this paper is trying to address.

Privacy Leakage
Inaccurate Responses
Ethical and Illegal Uses
Innovation

Methods, ideas, or system contributions that make the work stand out.

Large Language Models
Comprehensive Review
Ethical Use Framework
🔎 Similar Papers
No similar papers found.
Huandong Wang
Huandong Wang
Department of Electronic Engineering, Tsinghua University
mobile big data miningsocial media analysissoftware-defined networks
Wenjie Fu
Wenjie Fu
Ph.D, Southeast University
VLSI design and test automation
Y
Yingzhou Tang
Department of Electronic Engineering, Tsinghua University, China
Zhilong Chen
Zhilong Chen
Tsinghua University
Social ComputingComputational Social Science
Yuxi Huang
Yuxi Huang
Unknown affiliation
Generative RetrievalLLM-based RecommendationPersonalization of LLMs
J
J. Piao
Department of Electronic Engineering, Tsinghua University, China
C
Chen Gao
BNRist, Tsinghua University, China
Fengli Xu
Fengli Xu
Tsinghua University
LLM AgentData ScienceSocial ComputingScience of ScienceUrban Science
T
Tao Jiang
Huazhong University of Science and Technology, China
Y
Yong Li
Department of Electronic Engineering, Tsinghua University, China