EditCast3D: Single-Frame-Guided 3D Editing with Video Propagation and View Selection

📅 2025-10-11

📈 Citations: 0

✨ Influential: 0

career value

196K/year

🤖 AI Summary

Existing 3D editing methods rely heavily on costly image-generation models, incur substantial computational overhead, and struggle to integrate into iterative editing pipelines. To address these limitations, this paper introduces a novel, single-frame-guided paradigm for efficient 3D editing. Our approach leverages open-source video generation foundation models (e.g., SVD) to propagate single-frame edits across multi-view video sequences and incorporates a consistency-aware view selection mechanism that automatically identifies geometrically and semantically coherent key views for feedforward 3D reconstruction—eliminating redundant per-frame editing. Crucially, the framework avoids iterative optimization and reduces dependence on proprietary APIs or specialized image-editing models. Extensive experiments on mainstream 3D editing benchmarks demonstrate that our method achieves superior performance in editing fidelity, multi-view consistency, and inference efficiency, while maintaining scalability and practical applicability.

Technology Category

Application Category

📝 Abstract

Recent advances in foundation models have driven remarkable progress in image editing, yet their extension to 3D editing remains underexplored. A natural approach is to replace the image editing modules in existing workflows with foundation models. However, their heavy computational demands and the restrictions and costs of closed-source APIs make plugging these models into existing iterative editing strategies impractical. To address this limitation, we propose EditCast3D, a pipeline that employs video generation foundation models to propagate edits from a single first frame across the entire dataset prior to reconstruction. While editing propagation enables dataset-level editing via video models, its consistency remains suboptimal for 3D reconstruction, where multi-view alignment is essential. To overcome this, EditCast3D introduces a view selection strategy that explicitly identifies consistent and reconstruction-friendly views and adopts feedforward reconstruction without requiring costly refinement. In combination, the pipeline both minimizes reliance on expensive image editing and mitigates prompt ambiguities that arise when applying foundation models independently across images. We evaluate EditCast3D on commonly used 3D editing datasets and compare it against state-of-the-art 3D editing baselines, demonstrating superior editing quality and high efficiency. These results establish EditCast3D as a scalable and general paradigm for integrating foundation models into 3D editing pipelines. The code is available at https://github.com/UNITES-Lab/EditCast3D

Problem

Research questions and friction points this paper is trying to address.

Extending foundation models to efficient 3D editing pipelines

Ensuring multi-view consistency for 3D reconstruction from edits

Reducing computational costs of iterative editing strategies

Innovation

Methods, ideas, or system contributions that make the work stand out.

Propagates edits via video generation foundation models

Selects consistent views for improved 3D reconstruction

Uses feedforward reconstruction without costly refinement

🔎 Similar Papers

No similar papers found.