Point3R: Streaming 3D Reconstruction with Explicit Spatial Pointer Memory

📅 2025-07-03

📈 Citations: 0

✨ Influential: 0

career value

187K/year

🤖 AI Summary

Existing implicit memory-based methods for 3D scene reconstruction suffer from limited capacity and loss of early-frame information. To address these issues, we propose an explicit spatial pointer memory mechanism that maintains updatable feature pointers for each 3D location, integrated with hierarchical 3D positional embeddings and cross-frame feature interaction to enable online, incremental, dense reconstruction in a global coordinate system. This design circumvents the capacity bottlenecks inherent in implicit representations and supports continuous fusion of both unordered and ordered image streams. Our method achieves state-of-the-art or competitive performance on multiple benchmarks while significantly reducing training cost. The implementation is open-sourced, ensuring both practical applicability and full reproducibility.

Technology Category

Application Category

📝 Abstract

Dense 3D scene reconstruction from an ordered sequence or unordered image collections is a critical step when bringing research in computer vision into practical scenarios. Following the paradigm introduced by DUSt3R, which unifies an image pair densely into a shared coordinate system, subsequent methods maintain an implicit memory to achieve dense 3D reconstruction from more images. However, such implicit memory is limited in capacity and may suffer from information loss of earlier frames. We propose Point3R, an online framework targeting dense streaming 3D reconstruction. To be specific, we maintain an explicit spatial pointer memory directly associated with the 3D structure of the current scene. Each pointer in this memory is assigned a specific 3D position and aggregates scene information nearby in the global coordinate system into a changing spatial feature. Information extracted from the latest frame interacts explicitly with this pointer memory, enabling dense integration of the current observation into the global coordinate system. We design a 3D hierarchical position embedding to promote this interaction and design a simple yet effective fusion mechanism to ensure that our pointer memory is uniform and efficient. Our method achieves competitive or state-of-the-art performance on various tasks with low training costs. Code is available at: https://github.com/YkiWu/Point3R.

Problem

Research questions and friction points this paper is trying to address.

Achieves dense 3D reconstruction from image sequences

Overcomes implicit memory limitations in prior methods

Uses explicit spatial pointer memory for scene integration

Innovation

Methods, ideas, or system contributions that make the work stand out.

Explicit spatial pointer memory for 3D reconstruction

3D hierarchical position embedding for interaction

Simple fusion mechanism for efficient memory

🔎 Similar Papers

No similar papers found.

World Labs

$250,000-$350,000 base salary (good-faith estimate for San Francisco Bay Area upon hire; actual offer based on experience, skills, and qualifications)

San Francisco / San Francisco Office, San Francisco, California, United States

3D Computer Vision Researcher

Kitware

Arlington, Virginia

Authors to Follow