DISC: Dense Integrated Semantic Context for Large-Scale Open-Set Semantic Mapping

📅 2026-03-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing open-vocabulary semantic mapping methods rely on image cropping, which incurs contextual loss, high computational overhead, and domain shift, thereby hindering real-time robotic perception. To address these limitations, this work proposes a single-pass, crop-free dense semantic feature extraction mechanism that directly harvests high-fidelity CLIP embeddings from intermediate layers of a Vision Transformer. A distance-weighted fusion strategy is introduced to achieve mask-aligned, purely semantic representations. Integrated within a fully GPU-accelerated architecture, the method enables an end-to-end online semantic mapping framework capable of voxel-level zero-shot open-vocabulary mapping. Evaluated on Replica, ScanNet, and HM3DSEM datasets, the approach significantly outperforms existing zero-shot methods in both semantic accuracy and retrieval performance while maintaining real-time deployability.

Technology Category

Application Category

📝 Abstract
Open-set semantic mapping enables language-driven robotic perception, but current instance-centric approaches are bottlenecked by context-depriving and computationally expensive crop-based feature extraction. To overcome this fundamental limitation, we introduce DISC (Dense Integrated Semantic Context), featuring a novel single-pass, distance-weighted extraction mechanism. By deriving high-fidelity CLIP embeddings directly from the vision transformer's intermediate layers, our approach eliminates the latency and domain-shift artifacts of traditional image cropping, yielding pure, mask-aligned semantic representations. To fully leverage these features in large-scale continuous mapping, DISC is built upon a fully GPU-accelerated architecture that replaces periodic offline processing with precise, on-the-fly voxel-level instance refinement. We evaluate our approach on standard benchmarks (Replica, ScanNet) and a newly generated large-scale-mapping dataset based on Habitat-Matterport 3D (HM3DSEM) to assess scalability across complex scenes in multi-story buildings. Extensive evaluations demonstrate that DISC significantly surpasses current state-of-the-art zero-shot methods in both semantic accuracy and query retrieval, providing a robust, real-time capable framework for robotic deployment. The full source code, data generation and evaluation pipelines will be made available at https://github.com/DFKI-NI/DISC.
Problem

Research questions and friction points this paper is trying to address.

open-set semantic mapping
crop-based feature extraction
context-depriving
computational efficiency
large-scale semantic mapping
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dense Integrated Semantic Context
single-pass feature extraction
distance-weighted embedding
GPU-accelerated semantic mapping
open-set semantic segmentation
🔎 Similar Papers
No similar papers found.
F
Felix Igelbrink
German Research Center for Artificial Intelligence (DFKI), Osnabrück, Germany
L
Lennart Niecksch
German Research Center for Artificial Intelligence (DFKI), Osnabrück, Germany and Department of Computer Science, Osnabrück University, Osnabrück, Germany
Martin Atzmueller
Martin Atzmueller
Professor - Osnabrück University & Scientific Director - German Research Center for AI (DFKI)
complex dataexplainable AIinterpretabilitymachine perceptionsemantic modeling
Joachim Hertzberg
Joachim Hertzberg
University of Osnabrück and DFKI, Osnabrück
Artificial IntelligenceRoboticsplan-based robot controlsensor data interpretationAgricultural Robotics