đ¤ AI Summary
Robots struggle to predict manipulation outcomes in visually ambiguous scenes; while generative models (e.g., diffusion models) offer theoretical suitability, their practical performance is hampered by insufficient exploitation of historical interaction data. To address this, we propose a decoupled âgenerationâverificationâ framework: an unconditional diffusion model first samples multiple candidate actions, and a history-aware verifierâexplicitly modeling past interaction sequencesâthen evaluates, filters, and re-ranks these candidates. We theoretically prove that this verification mechanism improves the expected quality of selected actions. Our method integrates multimodal perception and online learning. Extensive experiments on simulated and real-world tasksâincluding articulated object manipulation, multimodal door opening, and uneven surface graspingâdemonstrate significant improvements over state-of-the-art baselines. Results validate that history-guided verification is critical for robust manipulation under visual ambiguity.
đ Abstract
We introduce a novel History-Aware VErifier (HAVE) to disambiguate uncertain scenarios online by leveraging past interactions. Robots frequently encounter visually ambiguous objects whose manipulation outcomes remain uncertain until physically interacted with. While generative models alone could theoretically adapt to such ambiguity, in practice they obtain suboptimal performance in ambiguous cases, even when conditioned on action history. To address this, we propose explicitly decoupling action generation from verification: we use an unconditional diffusion-based generator to propose multiple candidate actions and employ our history-aware verifier to select the most promising action by reasoning about past interactions. Through theoretical analysis, we demonstrate that employing a verifier significantly improves expected action quality. Empirical evaluations and analysis across multiple simulated and real-world environments including articulated objects, multi-modal doors, and uneven object pick-up confirm the effectiveness of our method and improvements over baselines. Our project website is available at: https://liy1shu.github.io/HAVE_CoRL25/