🤖 AI Summary
Existing methods struggle to reliably infer manipulation modes—such as push/pull and left/right side—for hinge-like articulated objects under occlusion, symmetry, or visual ambiguity. To address this, we propose the History-Aware Diffusion Network (HADN), the first approach to integrate diffusion models into multimodal articulated motion pattern modeling. HADN encodes temporal observations to incorporate historical context and formulates a conditional generative framework wherein the multimodal (i.e., multi-modal-peak) distribution of manipulation modes is explicitly modeled during iterative denoising. This design significantly enhances prediction stability and robustness under occlusion. Evaluated on standard articulated object manipulation benchmarks, HADN achieves state-of-the-art performance, improving average manipulation success rate by 12.7% over prior methods—with particularly pronounced gains under severe occlusion conditions.
📝 Abstract
We introduce a novel approach for manipulating articulated objects which are visually ambiguous, such doors which are symmetric or which are heavily occluded. These ambiguities can cause uncertainty over different possible articulation modes: for instance, when the articulation direction (e.g. push, pull, slide) or location (e.g. left side, right side) of a fully closed door are uncertain, or when distinguishing features like the plane of the door are occluded due to the viewing angle. To tackle these challenges, we propose a history-aware diffusion network that can model multi-modal distributions over articulation modes for articulated objects; our method further uses observation history to distinguish between modes and make stable predictions under occlusions. Experiments and analysis demonstrate that our method achieves state-of-art performance on articulated object manipulation and dramatically improves performance for articulated objects containing visual ambiguities. Our project website is available at https://flowbothd.github.io/.