Multi-Prompt Style Interpolation for Fine-Grained Artistic Control

📅 2025-03-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing text-driven image style transfer methods rely on single-text prompts, limiting fine-grained and interpretable multi-style control. This paper proposes a multi-text prompt-driven style interpolation framework enabling seamless fusion of diverse artistic styles—such as Cubism, Impressionism, and cartoon—in a single image, with spatially and semantically controllable editing. Key contributions include: (1) the first multi-prompt embedding mixer with adaptive weighted interpolation; (2) a hierarchical masked directional loss ensuring regional style consistency; and (3) integration of the StyleMamba state-space model with cross-modal alignment optimization. Experiments demonstrate significant improvements in style fidelity, text–image alignment accuracy, and artistic expressiveness. User studies confirm superiority over single-prompt and linear interpolation baselines, while maintaining efficient inference.

Technology Category

Application Category

📝 Abstract
Text-driven image style transfer has seen remarkable progress with methods leveraging cross-modal embeddings for fast, high-quality stylization. However, most existing pipelines assume a emph{single} textual style prompt, limiting the range of artistic control and expressiveness. In this paper, we propose a novel emph{multi-prompt style interpolation} framework that extends the recently introduced extbf{StyleMamba} approach. Our method supports blending or interpolating among multiple textual prompts (eg, ``cubism,'' ``impressionism,'' and ``cartoon''), allowing the creation of nuanced or hybrid artistic styles within a emph{single} image. We introduce a extit{Multi-Prompt Embedding Mixer} combined with extit{Adaptive Blending Weights} to enable fine-grained control over the spatial and semantic influence of each style. Further, we propose a emph{Hierarchical Masked Directional Loss} to refine region-specific style consistency. Experiments and user studies confirm our approach outperforms single-prompt baselines and naive linear combinations of styles, achieving superior style fidelity, text-image alignment, and artistic flexibility, all while maintaining the computational efficiency offered by the state-space formulation.
Problem

Research questions and friction points this paper is trying to address.

Extends single-prompt style transfer to multi-prompt interpolation
Enables nuanced and hybrid artistic styles in single images
Improves style fidelity and text-image alignment efficiently
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-prompt style interpolation framework
Multi-Prompt Embedding Mixer with Adaptive Blending Weights
Hierarchical Masked Directional Loss for style consistency
🔎 Similar Papers
No similar papers found.
L
Lei Chen
Department of Computer Science, Fictitious University of Technology, Country
H
Hao Li
Institute of AI Research, Imaginary Institute, Country
Y
Yuxin Zhang
Department of Computer Science, Fictitious University of Technology, Country
C
Chao Li
Dept. of Computer Science, Eastern Asia Institute of Technology, Beijing, China
Kai Wen
Kai Wen
Stanford University
Quantum computation and communicationOptimizationStochastic simulation