FlowDrag: 3D-aware Drag-based Image Editing with Mesh-guided Deformation Vector Flow Fields

πŸ“… 2025-07-10
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Existing drag-based image editing methods neglect global geometric structure, leading to deformation artifacts and inaccurate control-point alignment; moreover, the absence of real-world deformation annotations hinders objective evaluation. To address these issues, we propose Mesh-Guided Deformation Flow (MGDF): a method that constructs a differentiable 3D mesh to encode geometric priors, jointly optimizes energy terms and 2D projection mappings to generate structurally consistent deformation fields, and integrates a UNet-based denoiser to enhance fine-grained detail fidelity. Furthermore, we introduce VFDβ€”the first video benchmark featuring real 3D deformation annotations. Evaluated on VFD Bench and DragBench, MGDF achieves significant improvements in control-point accuracy (+12.6%) and structural stability (+9.3% PSNR), enabling precise, geometry-consistent, and highly controllable local editing.

Technology Category

Application Category

πŸ“ Abstract
Drag-based editing allows precise object manipulation through point-based control, offering user convenience. However, current methods often suffer from a geometric inconsistency problem by focusing exclusively on matching user-defined points, neglecting the broader geometry and leading to artifacts or unstable edits. We propose FlowDrag, which leverages geometric information for more accurate and coherent transformations. Our approach constructs a 3D mesh from the image, using an energy function to guide mesh deformation based on user-defined drag points. The resulting mesh displacements are projected into 2D and incorporated into a UNet denoising process, enabling precise handle-to-target point alignment while preserving structural integrity. Additionally, existing drag-editing benchmarks provide no ground truth, making it difficult to assess how accurately the edits match the intended transformations. To address this, we present VFD (VidFrameDrag) benchmark dataset, which provides ground-truth frames using consecutive shots in a video dataset. FlowDrag outperforms existing drag-based editing methods on both VFD Bench and DragBench.
Problem

Research questions and friction points this paper is trying to address.

Addresses geometric inconsistency in drag-based image editing
Improves 3D-aware deformation using mesh-guided flow fields
Introduces benchmark dataset for evaluating drag-editing accuracy
Innovation

Methods, ideas, or system contributions that make the work stand out.

3D mesh deformation for accurate image editing
UNet denoising with 2D displacement projection
VFD benchmark for ground-truth evaluation
πŸ”Ž Similar Papers
No similar papers found.