LaSSM: Efficient Semantic-Spatial Query Decoding via Local Aggregation and State Space Models for 3D Instance Segmentation

📅 2026-02-11
📈 Citations: 1
Influential: 1
📄 PDF
🤖 AI Summary
This work addresses the challenges of query initialization and high computational cost in query-based 3D instance segmentation, which stem from the sparsity of point clouds. To this end, the authors propose an efficient approach that enhances initial query quality through hierarchical semantic-spatial query initialization and introduces a coordinate-guided dual-path State Space Model (SSM) decoder augmented with a local aggregation mechanism. This design effectively suppresses noise while substantially reducing redundant computation. The method achieves state-of-the-art performance across multiple benchmarks, outperforming existing approaches by a 2.5% mAP margin on ScanNet++ V2 while requiring only one-third of the FLOPs.

Technology Category

Application Category

📝 Abstract
Query-based 3D scene instance segmentation from point clouds has attained notable performance. However, existing methods suffer from the query initialization dilemma due to the sparse nature of point clouds and rely on computationally intensive attention mechanisms in query decoders. We accordingly introduce LaSSM, prioritizing simplicity and efficiency while maintaining competitive performance. Specifically, we propose a hierarchical semantic-spatial query initializer to derive the query set from superpoints by considering both semantic cues and spatial distribution, achieving comprehensive scene coverage and accelerated convergence. We further present a coordinate-guided state space model (SSM) decoder that progressively refines queries. The novel decoder features a local aggregation scheme that restricts the model to focus on geometrically coherent regions and a spatial dual-path SSM block to capture underlying dependencies within the query set by integrating associated coordinates information. Our design enables efficient instance prediction, avoiding the incorporation of noisy information and reducing redundant computation. LaSSM ranks first place on the latest ScanNet++ V2 leaderboard, outperforming the previous best method by 2.5% mAP with only 1/3 FLOPs, demonstrating its superiority in challenging large-scale scene instance segmentation. LaSSM also achieves competitive performance on ScanNet, ScanNet200, S3DIS and ScanNet++ V1 benchmarks with less computational cost. Extensive ablation studies and qualitative results validate the effectiveness of our design. The code and weights are available at https://github.com/RayYoh/LaSSM.
Problem

Research questions and friction points this paper is trying to address.

3D instance segmentation
query initialization
point clouds
computational efficiency
attention mechanisms
Innovation

Methods, ideas, or system contributions that make the work stand out.

Local Aggregation
State Space Model
Semantic-Spatial Query Initialization
Coordinate-Guided Decoder
Efficient 3D Instance Segmentation
🔎 Similar Papers
No similar papers found.