Continuous 3D Perception Model with Persistent State

📅 2025-01-21

📈 Citations: 0

✨ Influential: 0

career value

186K/year

🤖 AI Summary

This work addresses the challenge of online perception and reconstruction for dynamic 3D scenes by proposing the first continuous 3D perception paradigm with persistent state representation. Methodologically, we introduce CUT3R—a Transformer-based architecture for continual updating—that integrates recursive state modeling, multi-view geometric prior embedding, and end-to-end differentiable point cloud regression. It supports both image streams and unordered image collections, generating metric-scale point clouds in a unified coordinate system and incrementally reconstructing dense dynamic scenes in real time. A novel virtual-view probing mechanism is introduced to explicitly reason about occluded regions, enabling natural compatibility with both static and dynamic scenes. Our approach achieves state-of-the-art performance across multiple 3D and 4D benchmarks, significantly improving online point cloud estimation accuracy and temporal reconstruction consistency.

Technology Category

Application Category

📝 Abstract

We present a unified framework capable of solving a broad range of 3D tasks. Our approach features a stateful recurrent model that continuously updates its state representation with each new observation. Given a stream of images, this evolving state can be used to generate metric-scale pointmaps (per-pixel 3D points) for each new input in an online fashion. These pointmaps reside within a common coordinate system, and can be accumulated into a coherent, dense scene reconstruction that updates as new images arrive. Our model, called CUT3R (Continuous Updating Transformer for 3D Reconstruction), captures rich priors of real-world scenes: not only can it predict accurate pointmaps from image observations, but it can also infer unseen regions of the scene by probing at virtual, unobserved views. Our method is simple yet highly flexible, naturally accepting varying lengths of images that may be either video streams or unordered photo collections, containing both static and dynamic content. We evaluate our method on various 3D/4D tasks and demonstrate competitive or state-of-the-art performance in each. Project Page: https://cut3r.github.io/

Problem

Research questions and friction points this paper is trying to address.

3D World Modeling

Real-time Update

Adaptive Content

Innovation

Methods, ideas, or system contributions that make the work stand out.

CUT3R

Continuous Learning

3D/4D Scene Reconstruction

🔎 Similar Papers

F3Loc: Fusion and Filtering for Floorplan Localization