Evaluating and Improving Robustness in Large Language Models: A Survey and Future Directions

📅 2025-06-08

📈 Citations: 0

✨ Influential: 0

career value

225K/year

🤖 AI Summary

Large language models (LLMs) exhibit insufficient robustness against toxic prompts, input noise, and out-of-distribution (OOD) scenarios. Method: This work formally defines LLM robustness for the first time and establishes a unified conceptual framework and comprehensive taxonomy covering adversarial perturbations, OOD generalization, and evaluation dimensions. Through systematic literature review and conceptual modeling, it synthesizes over 100 representative studies to propose a structured terminology system and a methodological map. Contribution/Results: We release an open-source, searchable repository (on GitHub) comprising authoritative benchmark datasets, evaluation metrics, tool inventories, and best-practice guidelines. This resource advances standardization in LLM robustness research, providing both theoretical foundations and practical support for rigorous evaluation, robustness enhancement, and cross-study comparison.

Technology Category

Application Category

📝 Abstract

Large Language Models (LLMs) have gained enormous attention in recent years due to their capability of understanding and generating natural languages. With the rapid development and wild-range applications (e.g., Agents, Embodied Intelligence), the robustness of LLMs has received increased attention. As the core brain of many AI applications, the robustness of LLMs requires that models should not only generate consistent contents, but also ensure the correctness and stability of generated content when dealing with unexpeted application scenarios (e.g., toxic prompts, limited noise domain data, outof-distribution (OOD) applications, etc). In this survey paper, we conduct a thorough review of the robustness of LLMs, aiming to provide a comprehensive terminology of concepts and methods around this field and facilitate the community. Specifically, we first give a formal definition of LLM robustness and present the collection protocol of this survey paper. Then, based on the types of perturbated inputs, we organize this survey from the following perspectives: 1) Adversarial Robustness: tackling the problem that prompts are manipulated intentionally, such as noise prompts, long context, data attack, etc; 2) OOD Robustness: dealing with the unexpected real-world application scenarios, such as OOD detection, zero-shot transferring, hallucinations, etc; 3) Evaluation of Robustness: summarizing the new evaluation datasets, metrics, and tools for verifying the robustness of LLMs. After reviewing the representative work from each perspective, we discuss and highlight future opportunities and research directions in this field. Meanwhile, we also organize related works and provide an easy-to-search project (https://github.com/zhangkunzk/Awesome-LLM-Robustness-papers) to support the community.

Problem

Research questions and friction points this paper is trying to address.

Assessing LLM robustness against adversarial and OOD scenarios

Improving consistency and correctness in LLM-generated content

Developing evaluation metrics for LLM robustness verification

Innovation

Methods, ideas, or system contributions that make the work stand out.

Surveying adversarial robustness techniques for LLMs

Exploring OOD robustness in unexpected scenarios

Summarizing evaluation datasets and metrics

🔎 Similar Papers

Evaluating Concurrent Robustness of Language Models Across Diverse Challenge Sets