🤖 AI Summary
Existing robotic exploration methods over-rely on passive visual perception, limiting their ability to reason about spatial and functional relationships among objects—thereby hindering active exploration in large-scale, complex environments. To address this, we propose the Actionable 3D Relational Object Graph (A3ROG), the first graph-based representation that explicitly models *actability* as a core structural attribute, unifying multi-type object relations and complex action spaces. Our approach integrates multimodal scene understanding, action-conditioned graph reasoning, and interaction-feedback-driven exploration policy learning, bridging the technical gap between tabletop manipulation and mobile robotic exploration. Evaluated across diverse real-world scenes, A3ROG achieves state-of-the-art performance in exploration completeness, object discovery rate, and cross-scene generalization—significantly outperforming vision-language model baselines.
📝 Abstract
Mobile exploration is a longstanding challenge in robotics, yet current methods primarily focus on active perception instead of active interaction, limiting the robot's ability to interact with and fully explore its environment. Existing robotic exploration approaches via active interaction are often restricted to tabletop scenes, neglecting the unique challenges posed by mobile exploration, such as large exploration spaces, complex action spaces, and diverse object relations. In this work, we introduce a 3D relational object graph that encodes diverse object relations and enables exploration through active interaction. We develop a system based on this representation and evaluate it across diverse scenes. Our qualitative and quantitative results demonstrate the system's effectiveness and generalization capabilities, outperforming methods that rely solely on vision-language models (VLMs).