Multi-Prompt Style Interpolation for Fine-Grained Artistic Control

📅 2025-03-20

📈 Citations: 0

✨ Influential: 0

career value

157K/year

🤖 AI Summary

Existing text-driven image style transfer methods rely on single-text prompts, limiting fine-grained and interpretable multi-style control. This paper proposes a multi-text prompt-driven style interpolation framework enabling seamless fusion of diverse artistic styles—such as Cubism, Impressionism, and cartoon—in a single image, with spatially and semantically controllable editing. Key contributions include: (1) the first multi-prompt embedding mixer with adaptive weighted interpolation; (2) a hierarchical masked directional loss ensuring regional style consistency; and (3) integration of the StyleMamba state-space model with cross-modal alignment optimization. Experiments demonstrate significant improvements in style fidelity, text–image alignment accuracy, and artistic expressiveness. User studies confirm superiority over single-prompt and linear interpolation baselines, while maintaining efficient inference.

Technology Category

Application Category

📝 Abstract

Text-driven image style transfer has seen remarkable progress with methods leveraging cross-modal embeddings for fast, high-quality stylization. However, most existing pipelines assume a emph{single} textual style prompt, limiting the range of artistic control and expressiveness. In this paper, we propose a novel emph{multi-prompt style interpolation} framework that extends the recently introduced extbf{StyleMamba} approach. Our method supports blending or interpolating among multiple textual prompts (eg, ``cubism,'' ``impressionism,'' and ``cartoon''), allowing the creation of nuanced or hybrid artistic styles within a emph{single} image. We introduce a extit{Multi-Prompt Embedding Mixer} combined with extit{Adaptive Blending Weights} to enable fine-grained control over the spatial and semantic influence of each style. Further, we propose a emph{Hierarchical Masked Directional Loss} to refine region-specific style consistency. Experiments and user studies confirm our approach outperforms single-prompt baselines and naive linear combinations of styles, achieving superior style fidelity, text-image alignment, and artistic flexibility, all while maintaining the computational efficiency offered by the state-space formulation.

Problem

Research questions and friction points this paper is trying to address.

Extends single-prompt style transfer to multi-prompt interpolation

Enables nuanced and hybrid artistic styles in single images

Improves style fidelity and text-image alignment efficiently

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-prompt style interpolation framework

Multi-Prompt Embedding Mixer with Adaptive Blending Weights

Hierarchical Masked Directional Loss for style consistency

🔎 Similar Papers

DiffArtist: Towards Structure and Appearance Controllable Image Stylization