🤖 AI Summary
Robots struggle to predict manipulation outcomes in visually ambiguous scenes; while generative models (e.g., diffusion models) offer theoretical suitability, their practical performance is hampered by insufficient exploitation of historical interaction data. To address this, we propose a decoupled “generation–verification” framework: an unconditional diffusion model first samples multiple candidate actions, and a history-aware verifier—explicitly modeling past interaction sequences—then evaluates, filters, and re-ranks these candidates. We theoretically prove that this verification mechanism improves the expected quality of selected actions. Our method integrates multimodal perception and online learning. Extensive experiments on simulated and real-world tasks—including articulated object manipulation, multimodal door opening, and uneven surface grasping—demonstrate significant improvements over state-of-the-art baselines. Results validate that history-guided verification is critical for robust manipulation under visual ambiguity.
📝 Abstract
We introduce a novel History-Aware VErifier (HAVE) to disambiguate uncertain scenarios online by leveraging past interactions. Robots frequently encounter visually ambiguous objects whose manipulation outcomes remain uncertain until physically interacted with. While generative models alone could theoretically adapt to such ambiguity, in practice they obtain suboptimal performance in ambiguous cases, even when conditioned on action history. To address this, we propose explicitly decoupling action generation from verification: we use an unconditional diffusion-based generator to propose multiple candidate actions and employ our history-aware verifier to select the most promising action by reasoning about past interactions. Through theoretical analysis, we demonstrate that employing a verifier significantly improves expected action quality. Empirical evaluations and analysis across multiple simulated and real-world environments including articulated objects, multi-modal doors, and uneven object pick-up confirm the effectiveness of our method and improvements over baselines. Our project website is available at: https://liy1shu.github.io/HAVE_CoRL25/