FlowDrag: 3D-aware Drag-based Image Editing with Mesh-guided Deformation Vector Flow Fields

📅 2025-07-10

📈 Citations: 0

✨ Influential: 0

career value

213K/year

🤖 AI Summary

Existing drag-based image editing methods neglect global geometric structure, leading to deformation artifacts and inaccurate control-point alignment; moreover, the absence of real-world deformation annotations hinders objective evaluation. To address these issues, we propose Mesh-Guided Deformation Flow (MGDF): a method that constructs a differentiable 3D mesh to encode geometric priors, jointly optimizes energy terms and 2D projection mappings to generate structurally consistent deformation fields, and integrates a UNet-based denoiser to enhance fine-grained detail fidelity. Furthermore, we introduce VFD—the first video benchmark featuring real 3D deformation annotations. Evaluated on VFD Bench and DragBench, MGDF achieves significant improvements in control-point accuracy (+12.6%) and structural stability (+9.3% PSNR), enabling precise, geometry-consistent, and highly controllable local editing.

Technology Category

Application Category

📝 Abstract

Drag-based editing allows precise object manipulation through point-based control, offering user convenience. However, current methods often suffer from a geometric inconsistency problem by focusing exclusively on matching user-defined points, neglecting the broader geometry and leading to artifacts or unstable edits. We propose FlowDrag, which leverages geometric information for more accurate and coherent transformations. Our approach constructs a 3D mesh from the image, using an energy function to guide mesh deformation based on user-defined drag points. The resulting mesh displacements are projected into 2D and incorporated into a UNet denoising process, enabling precise handle-to-target point alignment while preserving structural integrity. Additionally, existing drag-editing benchmarks provide no ground truth, making it difficult to assess how accurately the edits match the intended transformations. To address this, we present VFD (VidFrameDrag) benchmark dataset, which provides ground-truth frames using consecutive shots in a video dataset. FlowDrag outperforms existing drag-based editing methods on both VFD Bench and DragBench.

Problem

Research questions and friction points this paper is trying to address.

Addresses geometric inconsistency in drag-based image editing

Improves 3D-aware deformation using mesh-guided flow fields

Introduces benchmark dataset for evaluating drag-editing accuracy

Innovation

Methods, ideas, or system contributions that make the work stand out.

3D mesh deformation for accurate image editing

UNet denoising with 2D displacement projection

VFD benchmark for ground-truth evaluation

🔎 Similar Papers

No similar papers found.