🤖 AI Summary
Current semantic SLAM systems face limitations in semantic understanding depth, robustness of loop closure detection, and stability of data association. This work proposes an RGB-based semantic SLAM framework grounded in a unified geometry-instance representation, which for the first time extends geometric foundation models to jointly predict dense geometric structures and view-consistent instance embeddings. By introducing viewpoint-invariant semantic anchors, the method bridges the gap between geometric reconstruction and open-vocabulary semantic mapping, enabling semantic-coherent data association and instance-guided loop closure detection. Experimental results demonstrate that the proposed system significantly outperforms state-of-the-art approaches in terms of map consistency and reliability under large-baseline loop closure scenarios.
📝 Abstract
Geometry foundation models have significantly advanced dense geometric SLAM, yet existing systems often lack deep semantic understanding and robust loop closure capabilities. Meanwhile, contemporary semantic mapping approaches are frequently hindered by decoupled architectures and fragile data association. We propose IRIS-SLAM, a novel RGB semantic SLAM system that leverages unified geometric-instance representations derived from an instance-extended foundation model. By extending a geometry foundation model to concurrently predict dense geometry and cross-view consistent instance embeddings, we enable a semantic-synergized association mechanism and instance-guided loop closure detection. Our approach effectively utilizes viewpoint-agnostic semantic anchors to bridge the gap between geometric reconstruction and open-vocabulary mapping. Experimental results demonstrate that IRIS-SLAM significantly outperforms state-of-the-art methods, particularly in map consistency and wide-baseline loop closure reliability.