๐ค AI Summary
Existing inversion-based diffusion-style transfer methods are computationally intensive, prone to visual distortions, and inefficient. This paper proposes the first inversion-free, forward-only style transfer framework for diffusion models. Our method introduces a dual-correction flow to jointly model the evolution trajectories of content and style in parallel; incorporates a dynamic midpoint interpolation mechanism to fuse velocity fieldsโthereby avoiding inversion-induced artifacts; and integrates attention injection for fine-grained style control. The entire process requires only a single forward pass, enabling high-fidelity, style-consistent image generation while preserving structural integrity of the source content. Experiments demonstrate significant improvements: 5โ10ร faster inference over inversion-based approaches, superior visual quality across diverse style-content combinations, and strong generalization to unseen styles and contents. This work establishes a new paradigm for efficient, training-free diffusion-based style transfer.
๐ Abstract
Style transfer, a pivotal task in image processing, synthesizes visually compelling images by seamlessly blending realistic content with artistic styles, enabling applications in photo editing and creative design. While mainstream training-free diffusion-based methods have greatly advanced style transfer in recent years, their reliance on computationally inversion processes compromises efficiency and introduces visual distortions when inversion is inaccurate. To address these limitations, we propose a novel extit{inversion-free} style transfer framework based on dual rectified flows, which tackles the challenge of finding an unknown stylized distribution from two distinct inputs (content and style images), extit{only with forward pass}. Our approach predicts content and style trajectories in parallel, then fuses them through a dynamic midpoint interpolation that integrates velocities from both paths while adapting to the evolving stylized image. By jointly modeling the content, style, and stylized distributions, our velocity field design achieves robust fusion and avoids the shortcomings of naive overlays. Attention injection further guides style integration, enhancing visual fidelity, content preservation, and computational efficiency. Extensive experiments demonstrate generalization across diverse styles and content, providing an effective and efficient pipeline for style transfer.