🤖 AI Summary
Existing robotic systems struggle to efficiently and scalably update semantic information online in dynamic environments, limiting the real-time responsiveness and adaptability of task planning. To address this, we propose SPARK—a novel semantic integration framework that enables online construction and incremental updating of scene graphs. SPARK jointly leverages SLAM-derived geometric constraints and environment embeddings to explicitly model spatial-semantic relationships in a graph-structured representation. Furthermore, it introduces a spatially aware knowledge reasoning mechanism, enabling, for the first time, online recognition and reactive execution based on non-canonical interaction cues (e.g., gestures). Experimental results demonstrate that SPARK significantly improves task execution success rates and generalization capability in complex, dynamic scenarios—particularly excelling in tasks triggered by unconventional perceptual cues.
📝 Abstract
The ability to update information acquired through various means online during task execution is crucial for a general-purpose service robot. This information includes geometric and semantic data. While SLAM handles geometric updates on 2D maps or 3D point clouds, online updates of semantic information remain unexplored. We attribute the challenge to the online scene graph representation, for its utility and scalability. Building on previous works regarding offline scene graph representations, we study online graph representations of semantic information in this work. We introduce SPARK: Spatial Perception and Robot Knowledge Integration. This framework extracts semantic information from environment-embedded cues and updates the scene graph accordingly, which is then used for subsequent task planning. We demonstrate that graph representations of spatial relationships enhance the robot system's ability to perform tasks in dynamic environments and adapt to unconventional spatial cues, like gestures.