🤖 AI Summary
This work addresses the vulnerability of existing vision-language-action (VLA) models to backdoor attacks that are easily erased during user fine-tuning, rendering them ineffective. To overcome this limitation, the authors propose INFUSE, a novel framework that leverages parameter sensitivity analysis to identify modules least affected by fine-tuning and injects backdoors into these stable components while freezing the rest of the parameters. This strategy ensures that malicious behavior persists across arbitrary downstream fine-tuning scenarios—a first for VLA backdoor attacks—highlighting critical security risks in the foundational model distribution phase. Experimental results demonstrate that INFUSE achieves post-fine-tuning attack success rates of 91.0% in simulation and 79.8% on real robotic tasks, substantially outperforming BadVLA while maintaining high performance on benign tasks.
📝 Abstract
Vision-Language-Action (VLA) models have become foundational to modern embodied AI systems. By integrating visual perception, language understanding, and action planning, they enable general-purpose task execution across diverse environments. Despite their importance, the security of VLA models remains underexplored -- particularly in the context of backdoor attacks, which pose realistic threats in physical-world deployments. While recent methods attempt to inject backdoors into VLA models, these backdoors are easily erased during downstream adaptation, as user-side fine-tuning with clean data significantly alters model parameters, rendering them impractical for real-world applications. To address these challenges, we propose INFUSE (INjection into Fine-tUne-inSensitive modulEs), the first backdoor attack framework for VLA base models that remains effective even with arbitrary user fine-tuning. INFUSE begins by analyzing parameter sensitivity across diverse fine-tuning scenarios to identify modules that remain largely unchanged -- the fine-tune-insensitive modules. It then injects backdoors into these stable modules while freezing the rest, ensuring malicious behavior persists after extensive user fine-tuning. Comprehensive experiments across multiple VLA architectures demonstrate INFUSE's effectiveness. After user-side fine-tuning, INFUSE maintains mean attack success rates of 91.0% on simulation environments and 79.8% on real-world robot tasks, substantially surpassing BadVLA (38.8% and 36.6%, respectively), while preserving clean-task performance comparable to standard models. These results uncover a critical threat: backdoors implanted before distribution can persist through fine-tuning and remain effective at deployment.