TrianguLang: Geometry-Aware Semantic Consensus for Pose-Free 3D Localization

📅 2026-03-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work proposes a calibration-free, feedforward framework for 3D localization and segmentation that enables efficient, geometrically consistent text-guided 3D object understanding. By introducing a Geometry-Aware Semantic Attention (GASA) mechanism, the method effectively suppresses semantically plausible but geometrically inconsistent cross-view correspondences without requiring ground-truth pose priors. It integrates multi-view features with high-resolution images (1008×1008) for end-to-end inference. The approach achieves state-of-the-art performance across five benchmarks, including ScanNet++ and uCO3D, where a single text query replaces O(N) manual clicks. With an inference speed of 18 FPS (57 ms per frame), the method is well-suited for real-time applications in robotics and augmented reality.

Technology Category

Application Category

📝 Abstract
Localizing objects and parts from natural language in 3D space is essential for robotics, AR, and embodied AI, yet existing methods face a trade-off between the accuracy and geometric consistency of per-scene optimization and the efficiency of feed-forward inference. We present TrianguLang, a feed-forward framework for 3D localization that requires no camera calibration at inference. Unlike prior methods that treat views independently, we introduce Geometry-Aware Semantic Attention (GASA), which utilizes predicted geometry to gate cross-view feature correspondence, suppressing semantically plausible but geometrically inconsistent matches without requiring ground-truth poses. Validated on five benchmarks including ScanNet++ and uCO3D, TrianguLang achieves state-of-the-art feed-forward text-guided segmentation and localization, reducing user effort from $O(N)$ clicks to a single text query. The model processes each frame at 1008x1008 resolution in $\sim$57ms ($\sim$18 FPS) without optimization, enabling practical deployment for interactive robotics and AR applications. Code and checkpoints are available at https://cwru-aism.github.io/triangulang/.
Problem

Research questions and friction points this paper is trying to address.

3D localization
pose-free
natural language
geometric consistency
feed-forward inference
Innovation

Methods, ideas, or system contributions that make the work stand out.

Geometry-Aware Semantic Attention
pose-free 3D localization
feed-forward 3D reasoning
cross-view feature correspondence
text-guided segmentation
🔎 Similar Papers
No similar papers found.
B
Bryce Grant
Case Western Reserve University, Cleveland, OH, USA
A
Aryeh Rothenberg
Case Western Reserve University, Cleveland, OH, USA
A
Atri Banerjee
Case Western Reserve University, Cleveland, OH, USA
Peng Wang
Peng Wang
School of Computer Science, Northwestern Polytechnical University, China
Computer VisionMachine LearningArtificial Intelligence