InverseCrafter: Efficient Video ReCapture as a Latent Domain Inverse Problem

📅 2025-12-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing controllable 4D video generation methods rely on fine-tuning pre-trained video diffusion models (VDMs), incurring high computational costs, demanding large-scale data and architectural modifications, and causing catastrophic forgetting of the original generative priors. Method: We propose a fine-tuning-free inverse solving framework that formulates controllable generation as latent-space inpainting. For the first time, we encode pixel-level degradation operations—such as camera or viewpoint changes—into continuous multi-channel latent masks, implicitly modeling geometric transformations while preserving the pretrained VDM’s generative prior. This avoids repeated VAE encoding/decoding and gradient backpropagation. Contribution/Results: Our method enables end-to-end, low-overhead re-rendering and editing. It matches fine-tuned approaches in novel-view synthesis and camera-controlled generation, with superior measurement consistency. Moreover, it generalizes to arbitrary video inpainting tasks, achieving near-zero inference overhead.

Technology Category

Application Category

📝 Abstract
Recent approaches to controllable 4D video generation often rely on fine-tuning pre-trained Video Diffusion Models (VDMs). This dominant paradigm is computationally expensive, requiring large-scale datasets and architectural modifications, and frequently suffers from catastrophic forgetting of the model's original generative priors. Here, we propose InverseCrafter, an efficient inpainting inverse solver that reformulates the 4D generation task as an inpainting problem solved in the latent space. The core of our method is a principled mechanism to encode the pixel space degradation operator into a continuous, multi-channel latent mask, thereby bypassing the costly bottleneck of repeated VAE operations and backpropagation. InverseCrafter not only achieves comparable novel view generation and superior measurement consistency in camera control tasks with near-zero computational overhead, but also excels at general-purpose video inpainting with editing. Code is available at https://github.com/yeobinhong/InverseCrafter.
Problem

Research questions and friction points this paper is trying to address.

Efficient 4D video generation without fine-tuning diffusion models
Reformulating video generation as a latent inpainting inverse problem
Reducing computational cost while maintaining generative quality and consistency
Innovation

Methods, ideas, or system contributions that make the work stand out.

Reformulates 4D generation as latent space inpainting problem
Encodes pixel degradation into continuous multi-channel latent mask
Bypasses costly VAE operations and backpropagation bottlenecks