CoSMo3D: Open-World Promptable 3D Semantic Part Segmentation through LLM-Guided Canonical Spatial Modeling

πŸ“… 2026-03-01
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work addresses the challenge that existing open-world, promptable 3D semantic part segmentation methods operate in sensor coordinates, hindering their ability to model stable semantics of object functional roles. To overcome this limitation, we propose a learning framework based on an implicit canonical reference frame, employing a dual-branch architecture to achieve canonical mapping anchoring and bounding box calibration, thereby transferring perception from the input pose space to a unified canonical space. We innovatively introduce a large language model–guided cross-category canonical alignment mechanism, construct a unified canonical dataset spanning 200 categories, and learn pose-invariant canonical embeddings within the model. Our approach achieves state-of-the-art performance on open-world promptable 3D part segmentation, significantly enhancing segmentation stability and cross-category generalization.

Technology Category

Application Category

πŸ“ Abstract
Open-world promptable 3D semantic segmentation remains brittle as semantics are inferred in the input sensor coordinates. Yet, humans, in contrast, interpret parts via functional roles in a canonical space -- wings extend laterally, handles protrude to the side, and legs support from below. Psychophysical evidence shows that we mentally rotate objects into canonical frames to reveal these roles. To fill this gap, we propose \methodName{}, which attains canonical space perception by inducing a latent canonical reference frame learned directly from data. By construction, we create a unified canonical dataset through LLM-guided intra- and cross-category alignment, exposing canonical spatial regularities across 200 categories. By induction, we realize canonicality inside the model through a dual-branch architecture with canonical map anchoring and canonical box calibration, collapsing pose variation and symmetry into a stable canonical embedding. This shift from input pose space to canonical embedding yields far more stable and transferable part semantics. Experimental results show that \methodName{} establishes new state of the art in open-world promptable 3D segmentation.
Problem

Research questions and friction points this paper is trying to address.

open-world
3D semantic segmentation
canonical space
part segmentation
promptable
Innovation

Methods, ideas, or system contributions that make the work stand out.

canonical space modeling
LLM-guided alignment
open-world 3D segmentation
pose-invariant embedding
promptable part segmentation
L
Li Jin
SDU
Weikai Chen
Weikai Chen
Principal Research Scientist, Tencent America
3D AIGC3D VisionComputer graphicsVLM
Y
Yujie Wang
UNC Chapel Hill
Y
Yingda Yin
LIGHTSPEED
Z
Zeyu Hu
LIGHTSPEED
R
Runze Zhang
LIGHTSPEED
K
Keyang Luo
LIGHTSPEED
Shengju Qian
Shengju Qian
The Chinese University of Hong Kong
Generative ModelsTransformers
X
Xin Wang
LIGHTSPEED
Xueying Qin
Xueying Qin
Shandong University
Augmented RealityComputer VisionComputer Graphics