Rethinking Reasoning: A Survey on Reasoning-based Backdoors in LLMs

📅 2025-10-08
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Emerging backdoor attacks exploiting the reasoning capabilities of large language models (LLMs) pose novel security threats, yet systematic understanding remains lacking. Method: This paper presents the first comprehensive survey of “reasoning-based backdoor attacks,” integrating literature analysis with mechanistic modeling to propose a novel three-category taxonomy—associative, passive, and active—within a unified analytical framework that explicitly links attack mechanisms to defense strategies. Contribution/Results: The study identifies critical limitations of existing defenses under dynamic reasoning paths, clarifies unresolved challenges, and outlines concrete future research directions. By bridging a significant gap in LLM reasoning security surveys, this work establishes a foundational reference for both theoretical advancement and practical mitigation of reasoning-driven vulnerabilities, thereby supporting the development of trustworthy and controllable LLMs.

Technology Category

Application Category

📝 Abstract
With the rise of advanced reasoning capabilities, large language models (LLMs) are receiving increasing attention. However, although reasoning improves LLMs' performance on downstream tasks, it also introduces new security risks, as adversaries can exploit these capabilities to conduct backdoor attacks. Existing surveys on backdoor attacks and reasoning security offer comprehensive overviews but lack in-depth analysis of backdoor attacks and defenses targeting LLMs' reasoning abilities. In this paper, we take the first step toward providing a comprehensive review of reasoning-based backdoor attacks in LLMs by analyzing their underlying mechanisms, methodological frameworks, and unresolved challenges. Specifically, we introduce a new taxonomy that offers a unified perspective for summarizing existing approaches, categorizing reasoning-based backdoor attacks into associative, passive, and active. We also present defense strategies against such attacks and discuss current challenges alongside potential directions for future research. This work offers a novel perspective, paving the way for further exploration of secure and trustworthy LLM communities.
Problem

Research questions and friction points this paper is trying to address.

Surveying reasoning-based backdoor attacks in large language models
Analyzing attack mechanisms and defense strategies for LLMs
Addressing security risks from reasoning capabilities in language models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces taxonomy for reasoning-based backdoor attacks
Categorizes attacks into associative passive active types
Presents defense strategies against reasoning backdoor attacks
M
Man Hu
Beijing Electronic Science and Technology Institute, China
X
Xinyi Wu
Nanyang Technological University, Singapore
Z
Zuofeng Suo
Hainan University, China
J
Jinbo Feng
Beijing Electronic Science and Technology Institute, China
Linghui Meng
Linghui Meng
Institute of Automation, Chinese Academy of Sciences, China
Reinforcement LearningAutomatic Speech Recognition
Yanhao Jia
Yanhao Jia
Nanyang Technological University
Artificial IntelligenceDeep LearningComputational Neuroscience
A
Anh Tuan Luu
Nanyang Technological University, Singapore
S
Shuai Zhao
Nanyang Technological University, Singapore