🤖 AI Summary
This work proposes an expectation-guided dynamic semantic scene understanding framework to address the lack of semantic consistency in existing purely data-driven robotic object recognition methods, which struggle to incorporate environmental priors. By constructing a 3D semantic scene graph that integrates contextual priors with external knowledge bases such as ConceptNet, the framework introduces knowledge-level expectations as a prior for interpreting sensor data. Object reasoning is performed within a heterogeneous graph neural network, leveraging these knowledge-driven expectations to guide perception. The approach significantly enhances both semantic consistency and temporal continuity of object recognition in dynamic environments, enabling robots to interpret scenes more coherently over time.
📝 Abstract
While deep learning has significantly advanced robotic object recognition, purely data-driven approaches often lack semantic consistency and fail to leverage valuable, pre-existing knowledge about the environment. This report presents the ExPrIS project, which addresses this challenge by investigating how knowledge-level expectations can serve as to improve object interpretation from sensor data. Our approach is based on the incremental construction of a 3D Semantic Scene Graph (3DSSG). We integrate expectations from two sources: contextual priors from past observations and semantic knowledge from external graphs like ConceptNet. These are embedded into a heterogeneous Graph Neural Network (GNN) to create an expectation-biased inference process. This method moves beyond static, frame-by-frame analysis to enhance the robustness and consistency of scene understanding over time. The report details this architecture, its evaluation, and outlines its planned integration on a mobile robotic platform.