PromptSculptor: Multi-Agent Based Text-to-Image Prompt Optimization

📅 2025-09-15

📈 Citations: 0

✨ Influential: 0

career value

214K/year

🤖 AI Summary

Manual prompt refinement in text-to-image generation is time-consuming and inefficient, requiring users to iteratively adjust prompts. Method: This paper proposes the first automated prompt optimization framework leveraging multi-agent collaboration and chain-of-thought (CoT) reasoning. It decomposes an ambiguous initial prompt into specialized subtasks—including semantic expansion, contextual completion, and style alignment—each executed by dedicated agents. A self-evaluation module and user-feedback-driven iterative refinement mechanism further enhance optimization. The framework is model-agnostic and seamlessly integrates with mainstream diffusion models (e.g., Stable Diffusion, SDXL, DALL·E). Contribution/Results: Experiments demonstrate a 62% reduction in average user iteration rounds compared to baselines, a 28% improvement in Fréchet Inception Distance (FID), and significantly higher user satisfaction. The approach exhibits strong generalizability and practical potential for industrial deployment.

Technology Category

Application Category

📝 Abstract

The rapid advancement of generative AI has democratized access to powerful tools such as Text-to-Image models. However, to generate high-quality images, users must still craft detailed prompts specifying scene, style, and context-often through multiple rounds of refinement. We propose PromptSculptor, a novel multi-agent framework that automates this iterative prompt optimization process. Our system decomposes the task into four specialized agents that work collaboratively to transform a short, vague user prompt into a comprehensive, refined prompt. By leveraging Chain-of-Thought reasoning, our framework effectively infers hidden context and enriches scene and background details. To iteratively refine the prompt, a self-evaluation agent aligns the modified prompt with the original input, while a feedback-tuning agent incorporates user feedback for further refinement. Experimental results demonstrate that PromptSculptor significantly enhances output quality and reduces the number of iterations needed for user satisfaction. Moreover, its model-agnostic design allows seamless integration with various T2I models, paving the way for industrial applications.

Problem

Research questions and friction points this paper is trying to address.

Automates iterative prompt optimization for text-to-image generation

Transforms vague user prompts into comprehensive refined prompts

Reduces iterations needed for high-quality image generation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-agent framework automates iterative prompt optimization

Chain-of-Thought reasoning infers hidden context details

Model-agnostic design integrates with various T2I models

🔎 Similar Papers

No similar papers found.