PSF-4D: A Progressive Sampling Framework for View Consistent 4D Editing

📅 2025-03-14

📈 Citations: 0

✨ Influential: 0

career value

189K/year

🤖 AI Summary

This paper addresses the challenges of temporal incoherence and cross-view inconsistency in instruction-driven editing of 4D scenes (spatiotemporal + multi-view). We propose PSF-4D, a progressive sampling framework that requires no external models. Its core contributions are: (i) the first introduction of correlated Gaussian noise modeling to explicitly enforce inter-frame temporal consistency; (ii) a novel cross-view shared-independent noise decomposition mechanism, coupled with view-aware iterative refinement, enabling joint optimization of temporal, spatial, and view consistency within a unified diffusion framework; and (iii) support for diverse editing tasks—including style transfer, multi-attribute editing, object removal, and local editing. Extensive experiments demonstrate that PSF-4D consistently outperforms state-of-the-art methods in editing fidelity, temporal coherence, and multi-view consistency.

Technology Category

Application Category

📝 Abstract

Instruction-guided generative models, especially those using text-to-image (T2I) and text-to-video (T2V) diffusion frameworks, have advanced the field of content editing in recent years. To extend these capabilities to 4D scene, we introduce a progressive sampling framework for 4D editing (PSF-4D) that ensures temporal and multi-view consistency by intuitively controlling the noise initialization during forward diffusion. For temporal coherence, we design a correlated Gaussian noise structure that links frames over time, allowing each frame to depend meaningfully on prior frames. Additionally, to ensure spatial consistency across views, we implement a cross-view noise model, which uses shared and independent noise components to balance commonalities and distinct details among different views. To further enhance spatial coherence, PSF-4D incorporates view-consistent iterative refinement, embedding view-aware information into the denoising process to ensure aligned edits across frames and views. Our approach enables high-quality 4D editing without relying on external models, addressing key challenges in previous methods. Through extensive evaluation on multiple benchmarks and multiple editing aspects (e.g., style transfer, multi-attribute editing, object removal, local editing, etc.), we show the effectiveness of our proposed method. Experimental results demonstrate that our proposed method outperforms state-of-the-art 4D editing methods in diverse benchmarks.

Problem

Research questions and friction points this paper is trying to address.

Ensures temporal and multi-view consistency in 4D editing.

Introduces correlated Gaussian noise for temporal coherence.

Implements cross-view noise model for spatial consistency.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Progressive sampling framework for 4D editing

Correlated Gaussian noise for temporal coherence

Cross-view noise model for spatial consistency

🔎 Similar Papers

View-Consistent 3D Editing with Gaussian Splatting