Geometric Context Transformer for Streaming 3D Reconstruction

📅 2026-04-15

📈 Citations: 0

✨ Influential: 0

career value

179K/year

🤖 AI Summary

This work addresses the challenge of achieving high geometric accuracy, temporal consistency, and computational efficiency simultaneously in real-time 3D reconstruction from video streams. To this end, the authors propose LingBot-Map, a feedforward SLAM-based 3D foundation model that leverages a novel Geometry Context Transformer (GCT) architecture. The GCT integrates anchor context, pose reference windows, and trajectory memory mechanisms to enable efficient streaming inference under a compact state representation. This approach effectively tackles three core challenges: coordinate alignment, dense geometric modeling, and long-term drift correction. Evaluated across multiple benchmarks, LingBot-Map significantly outperforms existing streaming and iterative methods, demonstrating robust performance on sequences exceeding 10,000 frames at approximately 20 FPS with a resolution of 518×378.

Technology Category

Application Category

📝 Abstract

Streaming 3D reconstruction aims to recover 3D information, such as camera poses and point clouds, from a video stream, which necessitates geometric accuracy, temporal consistency, and computational efficiency. Motivated by the principles of Simultaneous Localization and Mapping (SLAM), we introduce LingBot-Map, a feed-forward 3D foundation model for reconstructing scenes from streaming data, built upon a geometric context transformer (GCT) architecture. A defining aspect of LingBot-Map lies in its carefully designed attention mechanism, which integrates an anchor context, a pose-reference window, and a trajectory memory to address coordinate grounding, dense geometric cues, and long-range drift correction, respectively. This design keeps the streaming state compact while retaining rich geometric context, enabling stable efficient inference at around 20 FPS on 518 x 378 resolution inputs over long sequences exceeding 10,000 frames. Extensive evaluations across a variety of benchmarks demonstrate that our approach achieves superior performance compared to both existing streaming and iterative optimization-based approaches.

Problem

Research questions and friction points this paper is trying to address.

Streaming 3D Reconstruction

Geometric Accuracy

Temporal Consistency

Computational Efficiency

SLAM

Innovation

Methods, ideas, or system contributions that make the work stand out.

Geometric Context Transformer

Streaming 3D Reconstruction

Trajectory Memory