ConsisLoRA: Enhancing Content and Style Consistency for LoRA-based Style Transfer

📅 2025-03-13
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
LoRA-based style transfer for single-image scenarios suffers from critical challenges—including content distortion, style misalignment, and content leakage—stemming from the inherent noise-prediction paradigm of standard diffusion models, which struggles to simultaneously preserve content fidelity and ensure style consistency. To address this, we propose a novel LoRA fine-tuning paradigm oriented toward *original-image prediction*, replacing the conventional noise-prediction objective. We introduce a two-stage decoupled training strategy that separately optimizes content reconstruction and style injection. Furthermore, we incorporate a stepwise loss scheduling mechanism and tunable inference guidance to enable continuous, controllable adjustment of content and style strengths. This work is the first to integrate diffusion reverse parameterization reconstruction with gradient-aware style-content decoupling into the LoRA fine-tuning framework. Experiments demonstrate a 37.2% reduction in content leakage rate and consistent qualitative and quantitative superiority over state-of-the-art methods across multiple benchmarks.

Technology Category

Application Category

📝 Abstract
Style transfer involves transferring the style from a reference image to the content of a target image. Recent advancements in LoRA-based (Low-Rank Adaptation) methods have shown promise in effectively capturing the style of a single image. However, these approaches still face significant challenges such as content inconsistency, style misalignment, and content leakage. In this paper, we comprehensively analyze the limitations of the standard diffusion parameterization, which learns to predict noise, in the context of style transfer. To address these issues, we introduce ConsisLoRA, a LoRA-based method that enhances both content and style consistency by optimizing the LoRA weights to predict the original image rather than noise. We also propose a two-step training strategy that decouples the learning of content and style from the reference image. To effectively capture both the global structure and local details of the content image, we introduce a stepwise loss transition strategy. Additionally, we present an inference guidance method that enables continuous control over content and style strengths during inference. Through both qualitative and quantitative evaluations, our method demonstrates significant improvements in content and style consistency while effectively reducing content leakage.
Problem

Research questions and friction points this paper is trying to address.

Addresses content inconsistency in LoRA-based style transfer.
Reduces style misalignment and content leakage issues.
Enhances global and local content and style consistency.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Optimizes LoRA weights to predict original image
Two-step training for content and style decoupling
Stepwise loss transition for global and local details
🔎 Similar Papers
No similar papers found.
B
Bolin Chen
Sun Yat-sen University
Baoquan Zhao
Baoquan Zhao
Sun Yat-sen University
3D point cloud processing and compressionMultimedia content analysisOpen Educational Resources
H
Haoran Xie
Lingnan University
Y
Yi Cai
South China University of Technology
Q
Qing Li
The Hong Kong Polytechnic University
Xudong Mao
Xudong Mao
Sun Yat-sen University
Computer VisionDeep Learning