Interp3R: Continuous-time 3D Geometry Estimation with Frames and Events

📅 2026-03-15

📈 Citations: 0

✨ Influential: 0

career value

203K/year

🤖 AI Summary

This work addresses the limitation of existing point-based 3D vision models, which operate solely on discrete image frames and struggle to capture scene dynamics in continuous time. To overcome this, we propose the first integration of event camera data into a point-based foundation model, introducing a continuous-time representation that fuses asynchronous event streams with conventional frame-based images. Our approach leverages point-based interpolation, continuous geometric alignment, and end-to-end joint optimization to enable depth and camera pose estimation at arbitrary time instants. Trained exclusively on synthetic data, the method demonstrates strong generalization across diverse synthetic and real-world benchmarks, significantly outperforming two-stage baselines that first interpolate 2D frames and then perform 3D reconstruction. This represents a notable advance beyond the temporal discretization inherent in traditional approaches.

Technology Category

Application Category

📝 Abstract

In recent years, 3D visual foundation models pioneered by pointmap-based approaches such as DUSt3R have attracted a lot of interest, achieving impressive accuracy and strong generalization across diverse scenes. However, these methods are inherently limited to recovering scene geometry only at the discrete time instants when images are captured, leaving the scene evolution during the blind time between consecutive frames largely unexplored. We introduce Interp3R, to the best of our knowledge the first method that enhances pointmap-based models to estimate depth and camera poses at arbitrary time instants. Interp3R leverages asynchronous event data to interpolate pointmaps produced by frame-based models, enabling temporally continuous geometric representations. Depth and camera poses are then jointly recovered by aligning the interpolated pointmaps together with those predicted by the underlying frame-based models into a consistent spatial framework. We train Interp3R exclusively on a synthetic dataset, yet demonstrate strong generalization across a wide range of synthetic and real-world benchmarks. Extensive experiments show that Interp3R outperforms by a considerable margin state-of-the-art baselines that follow a two-stage pipeline of 2D video frame interpolation followed by 3D geometry estimation.

Problem

Research questions and friction points this paper is trying to address.

continuous-time 3D geometry

frame-based 3D reconstruction

temporal interpolation

event camera

scene evolution

Innovation

Methods, ideas, or system contributions that make the work stand out.

continuous-time 3D reconstruction

event-based interpolation

pointmap interpolation