Towards Backdoor-Based Ownership Verification for Vision-Language-Action Models

📅 2026-05-09

📈 Citations: 0

✨ Influential: 0

career value

206K/year

🤖 AI Summary

This work addresses the lack of effective ownership verification mechanisms for vision-language-action (VLA) models in sharing and adaptation scenarios by proposing GuardVLA, a novel framework that introduces the first backdoor-based watermarking scheme tailored for VLA models. During training, GuardVLA embeds a covert and benign backdoor watermark by injecting secret information into embodied visual data. Upon model release, ownership is verified through a “swap-and-detect” protocol that leverages a trigger projector in conjunction with an external classification head. Extensive experiments demonstrate that GuardVLA reliably authenticates model ownership across diverse datasets, architectures, and adaptation settings, while preserving original task performance and maintaining watermark detectability even after subsequent fine-tuning.

📝 Abstract

Vision-Language-Action models (VLAs) support generalist robotic control by enabling end-to-end decision policies directly from multi-modal inputs. As trained VLAs are increasingly shared and adapted, protecting model ownership becomes essential for secure deployment and responsible open-source usage. In this paper, we present GuardVLA, the first backdoor-based ownership verification framework specifically designed for VLAs. GuardVLA embeds a stealthy and harmless backdoor watermark into the protected model during training by injecting secret messages into embodied visual data. For post-release verification, we propose a swap-and-detect mechanism, in which the trigger projector and an external classifier head are used to activate and detect the embedded backdoor based on prediction probabilities. Extensive experiments across multiple datasets, model architectures, and adaptation settings demonstrate that GuardVLA enables reliable ownership verification while preserving benign task performance. Further results show that the embedded watermark remains detectable under post-release model adaptation.

Problem

Research questions and friction points this paper is trying to address.

Vision-Language-Action models

model ownership

backdoor

watermark

ownership verification

Innovation

Methods, ideas, or system contributions that make the work stand out.

backdoor watermarking

ownership verification

vision-language-action models