4RC: 4D Reconstruction via Conditional Querying Anytime and Anywhere

📅 2026-02-10

📈 Citations: 0

✨ Influential: 0

career value

206K/year

🤖 AI Summary

This work addresses the challenges of geometric-motion coupling and sparse or constrained outputs in monocular video-based 4D reconstruction by proposing 4RC, a unified feedforward framework. 4RC introduces a novel “encode once, query anywhere in space-time” paradigm that decouples 4D attributes into a static base geometry and time-varying relative motion. Leveraging a Transformer-based spatiotemporal encoder and a conditional query decoder, the method enables end-to-end learning of dense 4D representations. It supports high-fidelity querying of geometry and motion at arbitrary frames and continuous time instants, achieving state-of-the-art performance across multiple 4D reconstruction benchmarks, significantly outperforming both existing and concurrent approaches.

Technology Category

Application Category

📝 Abstract

We present 4RC, a unified feed-forward framework for 4D reconstruction from monocular videos. Unlike existing approaches that typically decouple motion from geometry or produce limited 4D attributes such as sparse trajectories or two-view scene flow, 4RC learns a holistic 4D representation that jointly captures dense scene geometry and motion dynamics. At its core, 4RC introduces a novel encode-once, query-anywhere and anytime paradigm: a transformer backbone encodes the entire video into a compact spatio-temporal latent space, from which a conditional decoder can efficiently query 3D geometry and motion for any query frame at any target timestamp. To facilitate learning, we represent per-view 4D attributes in a minimally factorized form by decomposing them into base geometry and time-dependent relative motion. Extensive experiments demonstrate that 4RC outperforms prior and concurrent methods across a wide range of 4D reconstruction tasks.

Problem

Research questions and friction points this paper is trying to address.

4D reconstruction

monocular video

scene geometry

motion dynamics

spatio-temporal representation

Innovation

Methods, ideas, or system contributions that make the work stand out.

4D reconstruction

conditional querying

spatio-temporal latent space