Accelerating Transformer-Based Monocular SLAM via Geometric Utility Scoring

📅 2026-04-09
📈 Citations: 0
Influential: 0
📄 PDF

career value

214K/year
🤖 AI Summary
This work addresses the excessive computational redundancy in monocular SLAM based on geometric foundation models (GFMs) within dense video streams, which arises from the need for dense geometric decoding. To mitigate this, the authors propose LeanGate—a lightweight feedforward frame gating network that predicts the geometric utility score of each frame prior to GFM feature extraction, enabling early filtering of redundant frames. LeanGate is the first method to perform utility assessment before geometric decoding, offering plug-and-play compatibility with existing GFM-SLAM pipelines. Experimental results demonstrate that LeanGate reduces tracking FLOPs by over 85% on standard SLAM benchmarks, achieves a 5× end-to-end throughput speedup, and maintains tracking and mapping accuracy comparable to dense baselines.

Technology Category

Application Category

📝 Abstract
Geometric Foundation Models (GFMs) have recently advanced monocular SLAM by providing robust, calibration-free 3D priors. However, deploying these models on dense video streams introduces significant computational redundancy. Current GFM-based SLAM systems typically rely on post hoc keyframe selection. Because of this, they must perform expensive dense geometric decoding simply to determine whether a frame contains novel geometry, resulting in late rejection and wasted computation. To mitigate this inefficiency, we propose LeanGate, a lightweight feed-forward frame-gating network. LeanGate predicts a geometric utility score to assess a frame's mapping value prior to the heavy GFM feature extraction and matching stages. As a predictive plug-and-play module, our approach bypasses over 90% of redundant frames. Evaluations on standard SLAM benchmarks demonstrate that LeanGate reduces tracking FLOPs by more than 85% and achieves a 5x end-to-end throughput speedup. Furthermore, it maintains the tracking and mapping accuracy of dense baselines.
Problem

Research questions and friction points this paper is trying to address.

monocular SLAM
Geometric Foundation Models
computational redundancy
keyframe selection
dense video streams
Innovation

Methods, ideas, or system contributions that make the work stand out.

Geometric Utility Scoring
Frame Gating
Monocular SLAM
Geometric Foundation Models
Computational Efficiency
🔎 Similar Papers
2024-08-07IEEE Transactions on roboticsCitations: 3