3D-Fixup: Advancing Photo Editing with 3D Priors

📅 2025-05-15

📈 Citations: 0

✨ Influential: 0

career value

202K/year

🤖 AI Summary

This work addresses the challenging problem of single-image-driven 3D-aware image editing—specifically object translation and arbitrary 3D rotation. We propose the first method that explicitly incorporates a learnable 3D prior into the diffusion model editing pipeline. Methodologically, we leverage video frame pairs as weak supervision and integrate explicit 3D spatial projections from an image-to-3D model, establishing a dual-path guidance mechanism (“video-driven + 3D projection”) that enforces geometric consistency during latent-space editing optimization. Our key contribution is the first synergistic integration of a learnable 3D prior with diffusion models for single-image editing—requiring neither 3D annotations nor auxiliary networks. Experiments demonstrate that our approach significantly improves geometric plausibility and 3D controllability of edited results while preserving identity fidelity, outperforming existing purely 2D editing methods.

Technology Category

Application Category

📝 Abstract

Despite significant advances in modeling image priors via diffusion models, 3D-aware image editing remains challenging, in part because the object is only specified via a single image. To tackle this challenge, we propose 3D-Fixup, a new framework for editing 2D images guided by learned 3D priors. The framework supports difficult editing situations such as object translation and 3D rotation. To achieve this, we leverage a training-based approach that harnesses the generative power of diffusion models. As video data naturally encodes real-world physical dynamics, we turn to video data for generating training data pairs, i.e., a source and a target frame. Rather than relying solely on a single trained model to infer transformations between source and target frames, we incorporate 3D guidance from an Image-to-3D model, which bridges this challenging task by explicitly projecting 2D information into 3D space. We design a data generation pipeline to ensure high-quality 3D guidance throughout training. Results show that by integrating these 3D priors, 3D-Fixup effectively supports complex, identity coherent 3D-aware edits, achieving high-quality results and advancing the application of diffusion models in realistic image manipulation. The code is provided at https://3dfixup.github.io/

Problem

Research questions and friction points this paper is trying to address.

Enables 3D-aware editing of 2D images using learned priors

Supports complex edits like object translation and rotation

Integrates 3D guidance to bridge 2D-to-3D transformation challenges

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses 3D priors for 2D image editing

Leverages diffusion models and video data

Integrates Image-to-3D guidance for transformations

🔎 Similar Papers

No similar papers found.