LIM: Large Interpolator Model for Dynamic Reconstruction

📅 2025-03-28

📈 Citations: 0

✨ Influential: 0

career value

196K/year

🤖 AI Summary

Existing methods for dynamic 4D asset reconstruction either rely on category-specific priors or incur prohibitive optimization latency. To address this, we propose the first Transformer-based, feed-forward implicit temporal interpolation framework. Our method enforces causal consistency via a dedicated loss, jointly optimizing triplane feature interpolation and implicit neural representations to synthesize high-fidelity, deformable geometry and UV-consistent textured mesh sequences at arbitrary continuous time steps from sparse keyframes. We further integrate a diffusion model to enhance multi-view consistency. The framework enables end-to-end monocular video reconstruction with second-level inference speed. Evaluated on multiple dynamic datasets, it significantly outperforms FiLM and linear interpolation baselines. To our knowledge, this is the first approach achieving cross-category generalization, high fidelity, and industrial deployability in 4D reconstruction.

Technology Category

Application Category

📝 Abstract

Reconstructing dynamic assets from video data is central to many in computer vision and graphics tasks. Existing 4D reconstruction approaches are limited by category-specific models or slow optimization-based methods. Inspired by the recent Large Reconstruction Model (LRM), we present the Large Interpolation Model (LIM), a transformer-based feed-forward solution, guided by a novel causal consistency loss, for interpolating implicit 3D representations across time. Given implicit 3D representations at times $t_0$ and $t_1$, LIM produces a deformed shape at any continuous time $tin[t_0,t_1]$, delivering high-quality interpolated frames in seconds. Furthermore, LIM allows explicit mesh tracking across time, producing a consistently uv-textured mesh sequence ready for integration into existing production pipelines. We also use LIM, in conjunction with a diffusion-based multiview generator, to produce dynamic 4D reconstructions from monocular videos. We evaluate LIM on various dynamic datasets, benchmarking against image-space interpolation methods (e.g., FiLM) and direct triplane linear interpolation, and demonstrate clear advantages. In summary, LIM is the first feed-forward model capable of high-speed tracked 4D asset reconstruction across diverse categories.

Problem

Research questions and friction points this paper is trying to address.

Reconstruct dynamic assets from video data efficiently

Overcome limitations of category-specific or slow 4D reconstruction methods

Enable high-speed tracked 4D reconstruction across diverse categories

Innovation

Methods, ideas, or system contributions that make the work stand out.

Transformer-based feed-forward solution

Novel causal consistency loss guidance

Explicit mesh tracking across time

🔎 Similar Papers

OmniRe: Omni Urban Scene Reconstruction