🤖 AI Summary
Manual prompt refinement in text-to-image generation is time-consuming and inefficient, requiring users to iteratively adjust prompts. Method: This paper proposes the first automated prompt optimization framework leveraging multi-agent collaboration and chain-of-thought (CoT) reasoning. It decomposes an ambiguous initial prompt into specialized subtasks—including semantic expansion, contextual completion, and style alignment—each executed by dedicated agents. A self-evaluation module and user-feedback-driven iterative refinement mechanism further enhance optimization. The framework is model-agnostic and seamlessly integrates with mainstream diffusion models (e.g., Stable Diffusion, SDXL, DALL·E). Contribution/Results: Experiments demonstrate a 62% reduction in average user iteration rounds compared to baselines, a 28% improvement in Fréchet Inception Distance (FID), and significantly higher user satisfaction. The approach exhibits strong generalizability and practical potential for industrial deployment.
📝 Abstract
The rapid advancement of generative AI has democratized access to powerful tools such as Text-to-Image models. However, to generate high-quality images, users must still craft detailed prompts specifying scene, style, and context-often through multiple rounds of refinement. We propose PromptSculptor, a novel multi-agent framework that automates this iterative prompt optimization process. Our system decomposes the task into four specialized agents that work collaboratively to transform a short, vague user prompt into a comprehensive, refined prompt. By leveraging Chain-of-Thought reasoning, our framework effectively infers hidden context and enriches scene and background details. To iteratively refine the prompt, a self-evaluation agent aligns the modified prompt with the original input, while a feedback-tuning agent incorporates user feedback for further refinement. Experimental results demonstrate that PromptSculptor significantly enhances output quality and reduces the number of iterations needed for user satisfaction. Moreover, its model-agnostic design allows seamless integration with various T2I models, paving the way for industrial applications.