TrianguLang: Geometry-Aware Semantic Consensus for Pose-Free 3D Localization

📅 2026-03-09

📈 Citations: 0

✨ Influential: 0

career value

178K/year

🤖 AI Summary

This work proposes a calibration-free, feedforward framework for 3D localization and segmentation that enables efficient, geometrically consistent text-guided 3D object understanding. By introducing a Geometry-Aware Semantic Attention (GASA) mechanism, the method effectively suppresses semantically plausible but geometrically inconsistent cross-view correspondences without requiring ground-truth pose priors. It integrates multi-view features with high-resolution images (1008×1008) for end-to-end inference. The approach achieves state-of-the-art performance across five benchmarks, including ScanNet++ and uCO3D, where a single text query replaces O(N) manual clicks. With an inference speed of 18 FPS (57 ms per frame), the method is well-suited for real-time applications in robotics and augmented reality.

Technology Category

Application Category

📝 Abstract

Localizing objects and parts from natural language in 3D space is essential for robotics, AR, and embodied AI, yet existing methods face a trade-off between the accuracy and geometric consistency of per-scene optimization and the efficiency of feed-forward inference. We present TrianguLang, a feed-forward framework for 3D localization that requires no camera calibration at inference. Unlike prior methods that treat views independently, we introduce Geometry-Aware Semantic Attention (GASA), which utilizes predicted geometry to gate cross-view feature correspondence, suppressing semantically plausible but geometrically inconsistent matches without requiring ground-truth poses. Validated on five benchmarks including ScanNet++ and uCO3D, TrianguLang achieves state-of-the-art feed-forward text-guided segmentation and localization, reducing user effort from $O(N)$ clicks to a single text query. The model processes each frame at 1008x1008 resolution in $\sim$57ms ($\sim$18 FPS) without optimization, enabling practical deployment for interactive robotics and AR applications. Code and checkpoints are available at https://cwru-aism.github.io/triangulang/.

Problem

Research questions and friction points this paper is trying to address.

3D localization

pose-free

natural language

geometric consistency

feed-forward inference

Innovation

Methods, ideas, or system contributions that make the work stand out.

Geometry-Aware Semantic Attention

pose-free 3D localization

feed-forward 3D reasoning