🤖 AI Summary
This work addresses the challenge of geometric image editing in diffusion models—specifically, achieving precise object localization, orientation control, and non-rigid deformation while preserving global scene coherence. We propose a training-free, disentangled editing framework that decomposes editing into three sequential stages: geometric transformation of the target object, source-region inpainting, and target-region detail refinement. Leveraging FreeFine, our approach enables high-fidelity, zero-shot 2D/3D geometric editing without model fine-tuning. Our key innovation lies in the explicit decoupling of geometric operations from diffusion-based reconstruction—a first in the literature—thereby eliminating distortions and semantic drift inherent in end-to-end fine-tuning. Evaluated on our newly introduced GeoBench benchmark, the method achieves state-of-the-art performance in large-scale deformations and complex structural edits, demonstrating superior image fidelity and geometric accuracy compared to existing approaches.
📝 Abstract
We tackle the task of geometric image editing, where an object within an image is repositioned, reoriented, or reshaped while preserving overall scene coherence. Previous diffusion-based editing methods often attempt to handle all relevant subtasks in a single step, proving difficult when transformations become large or structurally complex. We address this by proposing a decoupled pipeline that separates object transformation, source region inpainting, and target region refinement. Both inpainting and refinement are implemented using a training-free diffusion approach, FreeFine. In experiments on our new GeoBench benchmark, which contains both 2D and 3D editing scenarios, FreeFine outperforms state-of-the-art alternatives in image fidelity, and edit precision, especially under demanding transformations. Code and benchmark are available at: https://github.com/CIawevy/FreeFine