🤖 AI Summary
Large language models (LLMs) exhibit insufficient robustness against toxic prompts, input noise, and out-of-distribution (OOD) scenarios. Method: This work formally defines LLM robustness for the first time and establishes a unified conceptual framework and comprehensive taxonomy covering adversarial perturbations, OOD generalization, and evaluation dimensions. Through systematic literature review and conceptual modeling, it synthesizes over 100 representative studies to propose a structured terminology system and a methodological map. Contribution/Results: We release an open-source, searchable repository (on GitHub) comprising authoritative benchmark datasets, evaluation metrics, tool inventories, and best-practice guidelines. This resource advances standardization in LLM robustness research, providing both theoretical foundations and practical support for rigorous evaluation, robustness enhancement, and cross-study comparison.
📝 Abstract
Large Language Models (LLMs) have gained enormous attention in recent years due to their capability of understanding and generating natural languages. With the rapid development and wild-range applications (e.g., Agents, Embodied Intelligence), the robustness of LLMs has received increased attention. As the core brain of many AI applications, the robustness of LLMs requires that models should not only generate consistent contents, but also ensure the correctness and stability of generated content when dealing with unexpeted application scenarios (e.g., toxic prompts, limited noise domain data, outof-distribution (OOD) applications, etc). In this survey paper, we conduct a thorough review of the robustness of LLMs, aiming to provide a comprehensive terminology of concepts and methods around this field and facilitate the community. Specifically, we first give a formal definition of LLM robustness and present the collection protocol of this survey paper. Then, based on the types of perturbated inputs, we organize this survey from the following perspectives: 1) Adversarial Robustness: tackling the problem that prompts are manipulated intentionally, such as noise prompts, long context, data attack, etc; 2) OOD Robustness: dealing with the unexpected real-world application scenarios, such as OOD detection, zero-shot transferring, hallucinations, etc; 3) Evaluation of Robustness: summarizing the new evaluation datasets, metrics, and tools for verifying the robustness of LLMs. After reviewing the representative work from each perspective, we discuss and highlight future opportunities and research directions in this field. Meanwhile, we also organize related works and provide an easy-to-search project (https://github.com/zhangkunzk/Awesome-LLM-Robustness-papers) to support the community.