MERG3R: A Divide-and-Conquer Approach to Large-Scale Neural Visual Geometry

📅 2026-03-02

📈 Citations: 0

✨ Influential: 0

career value

229K/year

🤖 AI Summary

This work addresses the limited scalability of existing all-attention-based neural visual geometry methods, which are constrained by GPU memory and struggle with large-scale unordered image collections. The authors propose a training-free divide-and-conquer framework that first partitions the input images into geometrically diverse and overlapping subsets, each reconstructed locally via neural representations. These local reconstructions are then globally aligned and fused through a confidence-weighted bundle adjustment to produce a consistent 3D model. This approach achieves, for the first time, a model-agnostic scaling of neural visual geometry to large scenes. It significantly improves reconstruction accuracy, memory efficiency, and scalability on benchmarks including 7-Scenes, NRGBD, Tanks & Temples, and Cambridge Landmarks, enabling high-quality 3D reconstruction beyond GPU memory capacity.

Technology Category

Application Category

📝 Abstract

Recent advancements in neural visual geometry, including transformer-based models such as VGGT and Pi3, have achieved impressive accuracy on 3D reconstruction tasks. However, their reliance on full attention makes them fundamentally limited by GPU memory capacity, preventing them from scaling to large, unordered image collections. We introduce MERG3R, a training-free divide-and-conquer framework that enables geometric foundation models to operate far beyond their native memory limits. MERG3R first reorders and partitions unordered images into overlapping, geometrically diverse subsets that can be reconstructed independently. It then merges the resulting local reconstructions through an efficient global alignment and confidence-weighted bundle adjustment procedure, producing a globally consistent 3D model. Our framework is model-agnostic and can be paired with existing neural geometry models. Across large-scale datasets, including 7-Scenes, NRGBD, Tanks & Temples, and Cambridge Landmarks, MERG3R consistently improves reconstruction accuracy, memory efficiency, and scalability, enabling high-quality reconstruction when the dataset exceeds memory capacity limits.

Problem

Research questions and friction points this paper is trying to address.

neural visual geometry

3D reconstruction

GPU memory limitation

large-scale image collections

scalability

Innovation

Methods, ideas, or system contributions that make the work stand out.

divide-and-conquer

neural visual geometry

memory-efficient 3D reconstruction