🤖 AI Summary
LoRA-based style transfer for single-image scenarios suffers from critical challenges—including content distortion, style misalignment, and content leakage—stemming from the inherent noise-prediction paradigm of standard diffusion models, which struggles to simultaneously preserve content fidelity and ensure style consistency. To address this, we propose a novel LoRA fine-tuning paradigm oriented toward *original-image prediction*, replacing the conventional noise-prediction objective. We introduce a two-stage decoupled training strategy that separately optimizes content reconstruction and style injection. Furthermore, we incorporate a stepwise loss scheduling mechanism and tunable inference guidance to enable continuous, controllable adjustment of content and style strengths. This work is the first to integrate diffusion reverse parameterization reconstruction with gradient-aware style-content decoupling into the LoRA fine-tuning framework. Experiments demonstrate a 37.2% reduction in content leakage rate and consistent qualitative and quantitative superiority over state-of-the-art methods across multiple benchmarks.
📝 Abstract
Style transfer involves transferring the style from a reference image to the content of a target image. Recent advancements in LoRA-based (Low-Rank Adaptation) methods have shown promise in effectively capturing the style of a single image. However, these approaches still face significant challenges such as content inconsistency, style misalignment, and content leakage. In this paper, we comprehensively analyze the limitations of the standard diffusion parameterization, which learns to predict noise, in the context of style transfer. To address these issues, we introduce ConsisLoRA, a LoRA-based method that enhances both content and style consistency by optimizing the LoRA weights to predict the original image rather than noise. We also propose a two-step training strategy that decouples the learning of content and style from the reference image. To effectively capture both the global structure and local details of the content image, we introduce a stepwise loss transition strategy. Additionally, we present an inference guidance method that enables continuous control over content and style strengths during inference. Through both qualitative and quantitative evaluations, our method demonstrates significant improvements in content and style consistency while effectively reducing content leakage.