FlowBotHD: History-Aware Diffuser Handling Ambiguities in Articulated Objects Manipulation

📅 2024-10-09
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing methods struggle to reliably infer manipulation modes—such as push/pull and left/right side—for hinge-like articulated objects under occlusion, symmetry, or visual ambiguity. To address this, we propose the History-Aware Diffusion Network (HADN), the first approach to integrate diffusion models into multimodal articulated motion pattern modeling. HADN encodes temporal observations to incorporate historical context and formulates a conditional generative framework wherein the multimodal (i.e., multi-modal-peak) distribution of manipulation modes is explicitly modeled during iterative denoising. This design significantly enhances prediction stability and robustness under occlusion. Evaluated on standard articulated object manipulation benchmarks, HADN achieves state-of-the-art performance, improving average manipulation success rate by 12.7% over prior methods—with particularly pronounced gains under severe occlusion conditions.

Technology Category

Application Category

📝 Abstract
We introduce a novel approach for manipulating articulated objects which are visually ambiguous, such doors which are symmetric or which are heavily occluded. These ambiguities can cause uncertainty over different possible articulation modes: for instance, when the articulation direction (e.g. push, pull, slide) or location (e.g. left side, right side) of a fully closed door are uncertain, or when distinguishing features like the plane of the door are occluded due to the viewing angle. To tackle these challenges, we propose a history-aware diffusion network that can model multi-modal distributions over articulation modes for articulated objects; our method further uses observation history to distinguish between modes and make stable predictions under occlusions. Experiments and analysis demonstrate that our method achieves state-of-art performance on articulated object manipulation and dramatically improves performance for articulated objects containing visual ambiguities. Our project website is available at https://flowbothd.github.io/.
Problem

Research questions and friction points this paper is trying to address.

Operational Uncertainty
Complex Objects Manipulation
Occlusion Handling
Innovation

Methods, ideas, or system contributions that make the work stand out.

FlowBotHD
Multi-Modal Object Manipulation
Occlusion Robustness
Y
Yishu Li
Robotics Institute, School of Computer Science, Carnegie Mellon University, United States
W
Wen Hui Leng
Robotics Institute, School of Computer Science, Carnegie Mellon University, United States
Y
Yiming Fang
Robotics Institute, School of Computer Science, Carnegie Mellon University, United States
Ben Eisner
Ben Eisner
ML/Robotics Researcher, PhD Student at Carnegie Mellon
Deep LearningMachine LearningArtificial IntelligenceRobotics
David Held
David Held
Associate Professor in the Robotics Institute, Carnegie Mellon University
RoboticsComputer VisionMachine LearningDeep LearningReinforcement Learning