Counterfactual Behavior Cloning: Offline Imitation Learning from Imperfect Human Demonstrations

📅 2025-05-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses offline imitation learning from imperfect human demonstrations—characterized by noisy actions and suboptimal policies—by proposing Counterfactual Behavior Cloning (Counter-BC). The method explicitly models and corrects demonstration bias through counterfactual action generation, producing semantically consistent augmentations that disentangle and recover the demonstrator’s underlying intent policy from superficial behavioral traces. Built upon the behavior cloning framework, Counter-BC integrates counterfactual action sampling, consistency regularization, and a theoretically grounded policy disentanglement mechanism to enable robust optimization in the offline setting. Experiments on both simulated and real-world robotic platforms demonstrate that Counter-BC significantly outperforms existing baselines. It reliably recovers concise, consistent, and generalizable policies from highly noisy, multi-user, and low-skill demonstrations—without requiring online interaction or reward labels.

Technology Category

Application Category

📝 Abstract
Learning from humans is challenging because people are imperfect teachers. When everyday humans show the robot a new task they want it to perform, humans inevitably make errors (e.g., inputting noisy actions) and provide suboptimal examples (e.g., overshooting the goal). Existing methods learn by mimicking the exact behaviors the human teacher provides -- but this approach is fundamentally limited because the demonstrations themselves are imperfect. In this work we advance offline imitation learning by enabling robots to extrapolate what the human teacher meant, instead of only considering what the human actually showed. We achieve this by hypothesizing that all of the human's demonstrations are trying to convey a single, consistent policy, while the noise and sub-optimality within their behaviors obfuscates the data and introduces unintentional complexity. To recover the underlying policy and learn what the human teacher meant, we introduce Counter-BC, a generalized version of behavior cloning. Counter-BC expands the given dataset to include actions close to behaviors the human demonstrated (i.e., counterfactual actions that the human teacher could have intended, but did not actually show). During training Counter-BC autonomously modifies the human's demonstrations within this expanded region to reach a simple and consistent policy that explains the underlying trends in the human's dataset. Theoretically, we prove that Counter-BC can extract the desired policy from imperfect data, multiple users, and teachers of varying skill levels. Empirically, we compare Counter-BC to state-of-the-art alternatives in simulated and real-world settings with noisy demonstrations, standardized datasets, and real human teachers. See videos of our work here: https://youtu.be/XaeOZWhTt68
Problem

Research questions and friction points this paper is trying to address.

Overcoming imperfect human demonstrations in robot learning
Extracting consistent policy from noisy suboptimal human actions
Enhancing offline imitation learning with counterfactual behavior cloning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Extrapolates human intent from imperfect demonstrations
Expands dataset with counterfactual plausible actions
Autonomously simplifies noisy data to consistent policy
🔎 Similar Papers
No similar papers found.
Shahabedin Sagheb
Shahabedin Sagheb
Assistant Collegiate Professor, Virginia Tech
Robot LearningMachine LearningControl TheoryHapticsGame Theory
D
Dylan P. Losey
Virginia Tech, Department of Mechanical Engineering, 635 Prices Fork Rd, Blacksburg, VA, 24060, USA