Dataset Poisoning Attacks on Behavioral Cloning Policies

📅 2025-11-25

📈 Citations: 0

✨ Influential: 0

career value

220K/year

🤖 AI Summary

This work systematically investigates the vulnerability of Behavior Cloning (BC) policies to clean-label backdoor attacks—a previously unexplored threat in imitation learning. Specifically, an attacker injects visually subtle triggers into a small fraction of expert demonstrations, inducing spurious state-action correlations that cause severe performance degradation on triggered inputs during deployment, while preserving high accuracy on clean test data. To maximize attack efficacy, we propose an entropy-driven dynamic trigger mechanism that adaptively identifies high-uncertainty critical states at test time. Experiments demonstrate that our attack reduces BC success rates by over 80% under triggered conditions even with a mere 1% poisoning rate, exposing critical robustness deficiencies. This study uncovers a fundamental security blind spot in imitation learning concerning data provenance and trustworthiness, and establishes a new benchmark and analytical framework for backdoor detection and defense in behavioral cloning.

Technology Category

Application Category

📝 Abstract

Behavior Cloning (BC) is a popular framework for training sequential decision policies from expert demonstrations via supervised learning. As these policies are increasingly being deployed in the real world, their robustness and potential vulnerabilities are an important concern. In this work, we perform the first analysis of the efficacy of clean-label backdoor attacks on BC policies. Our backdoor attacks poison a dataset of demonstrations by injecting a visual trigger to create a spurious correlation that can be exploited at test time. We evaluate how policy vulnerability scales with the fraction of poisoned data, the strength of the trigger, and the trigger type. We also introduce a novel entropy-based test-time trigger attack that substantially degrades policy performance by identifying critical states where test-time triggering of the backdoor is expected to be most effective at degrading performance. We empirically demonstrate that BC policies trained on even minimally poisoned datasets exhibit deceptively high, near-baseline task performance despite being highly vulnerable to backdoor trigger attacks during deployment. Our results underscore the urgent need for more research into the robustness of BC policies, particularly as large-scale datasets are increasingly used to train policies for real-world cyber-physical systems. Videos and code are available at https://sites.google.com/view/dataset-poisoning-in-bc.

Problem

Research questions and friction points this paper is trying to address.

Analyzing clean-label backdoor attacks on behavioral cloning policies

Evaluating policy vulnerability to dataset poisoning with visual triggers

Demonstrating deceptive high performance despite backdoor attack susceptibility

Innovation

Methods, ideas, or system contributions that make the work stand out.

Clean-label backdoor attacks poison behavioral cloning datasets

Visual triggers create spurious correlations for test-time exploitation

Entropy-based method identifies critical states for performance degradation

🔎 Similar Papers

Robust Deep Reinforcement Learning against Adversarial Behavior Manipulation