Pandora: Articulated 3D Scene Graphs from Egocentric Vision

📅 2026-03-30
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing robotic mapping approaches struggle to fully perceive the state and structure of articulable objects—such as drawers and cabinet doors—resulting in incomplete semantic scene representations. This work proposes a novel method that leverages first-person visual data collected by humans wearing Aria glasses to recover the 3D geometry and kinematic structures of such movable components through heuristic motion analysis. The recovered articulation models are then integrated into a 3D scene graph, enabling effective knowledge transfer from human exploratory behaviors to robotic systems. Evaluated on a Boston Dynamics Spot robot, this approach significantly improves task success rates—particularly in concealed object retrieval—when operating solely with the enhanced scene graph, thereby overcoming a key limitation of conventional autonomous mapping in understanding dynamic scene structures.
📝 Abstract
Robotic mapping systems typically approach building metric-semantic scene representations from the robot's own sensors and cameras. However, these "first person" maps inherit the robot's own limitations due to its embodiment or skillset, which may leave many aspects of the environment unexplored. For example, the robot might not be able to open drawers or access wall cabinets. In this sense, the map representation is not as complete, and requires a more capable robot to fill in the gaps. We narrow these blind spots in current methods by leveraging egocentric data captured as a human naturally explores a scene wearing Project Aria glasses, giving a way to directly transfer knowledge about articulation from the human to any deployable robot. We demonstrate that, by using simple heuristics, we can leverage egocentric data to recover models of articulate object parts, with quality comparable to those of state-of-the-art methods based on other input modalities. We also show how to integrate these models into 3D scene graph representations, leading to a better understanding of object dynamics and object-container relationships. We finally demonstrate that these articulated 3D scene graphs enhance a robot's ability to perform mobile manipulation tasks, showcasing an application where a Boston Dynamics Spot is tasked with retrieving concealed target items, given only the 3D scene graph as input.
Problem

Research questions and friction points this paper is trying to address.

egocentric vision
articulated objects
3D scene graphs
robotic mapping
mobile manipulation
Innovation

Methods, ideas, or system contributions that make the work stand out.

egocentric vision
articulated 3D scene graphs
mobile manipulation
human-to-robot knowledge transfer
object articulation modeling
🔎 Similar Papers
No similar papers found.