๐ค AI Summary
In text-to-image diffusion models, increasing style strength often degrades content fidelity, making it challenging to simultaneously preserve semantic content and express stylistic attributes across varying intensity levels. To address this, we propose a content-style subspace fusion mechanismโthe first approach to systematically optimize the content-style Pareto frontier. Our method employs LoRA-based fine-tuning for efficient parameter control, leverages subspace decomposition to disentangle content and style representations, and introduces a novel Content-Style Balance Loss (CSBLoss) to dynamically regulate their trade-off. The framework maintains high generation quality while significantly improving cross-intensity content similarity. Quantitative evaluation demonstrates substantial reductions in Inverted Generational Distance (IGD) and Generational Distance (GD) metrics, outperforming state-of-the-art methods across multiple benchmarks. Our approach achieves superior and more robust content-style co-generation.
๐ Abstract
Recent advancements in text-to-image diffusion models have significantly improved the personalization and stylization of generated images. However, previous studies have only assessed content similarity under a single style intensity. In our experiments, we observe that increasing style intensity leads to a significant loss of content features, resulting in a suboptimal content-style frontier. To address this, we propose a novel approach to expand the content-style frontier by leveraging Content-Style Subspace Blending and a Content-Style Balance loss. Our method improves content similarity across varying style intensities, significantly broadening the content-style frontier. Extensive experiments demonstrate that our approach outperforms existing techniques in both qualitative and quantitative evaluations, achieving superior content-style trade-off with significantly lower Inverted Generational Distance (IGD) and Generational Distance (GD) scores compared to current methods.