🤖 AI Summary
This work addresses a critical security gap in existing backdoor attacks against large language models, which predominantly rely on user-visible prompt-based triggers and overlook the risks posed by structural signals in multi-turn dialogues. The study proposes, for the first time, using dialogue turn indices as implicit, structured triggers to activate backdoors without requiring any user input, thereby circumventing the limitations of conventional prompt-dependent attacks. Through fine-tuning-based backdoor injection and evaluation in multi-turn interaction scenarios, the method achieves an average attack success rate of 99.52% across four mainstream open-source large language models. It maintains robust performance under five representative defenses, with a success rate of 98.04%, and demonstrates strong generalization, revealing dialogue structure as a novel and potent attack surface.
📝 Abstract
Large Language Models (LLMs) are widely integrated into interactive systems such as dialogue agents and task-oriented assistants. This growing ecosystem also raises supply-chain risks, where adversaries can distribute poisoned models that degrade downstream reliability and user trust. Existing backdoor attacks and defenses are largely prompt-centric, focusing on user-visible triggers while overlooking structural signals in multi-turn conversations. We propose Turn-based Structural Trigger (TST), a backdoor attack that activates from dialogue structure, using the turn index as the trigger and remaining independent of user inputs. Across four widely used open-source LLM models, TST achieves an average attack success rate (ASR) of 99.52% with minimal utility degradation, and remains effective under five representative defenses with an average ASR of 98.04%. The attack also generalizes well across instruction datasets, maintaining an average ASR of 99.19%. Our results suggest that dialogue structure constitutes an important and under-studied attack surface for multi-turn LLM systems, motivating structure-aware auditing and mitigation in practice.