🤖 AI Summary
To address the challenge novice users face in crafting effective prompts for text-to-image generation, this paper introduces a semantic map–based interactive paradigm. It constructs a large-scale, high-quality prompt–image pair repository (containing over one million entries, automatically generated by LLMs), integrated with CLIP-based cross-modal embeddings and semantic clustering to enable multi-scale visual navigation and prompt exploration. This work is the first to deeply unify LLM-driven prompt synthesis, semantic embedding–based clustering, and a scalable map-style interface, facilitating human–AI collaborative prompt engineering. A user study (n = 72) demonstrates that our system significantly improves prompt authoring efficiency (+63%) and generation satisfaction (p < 0.01) over baseline tools. Furthermore, it empirically validates the feasibility and practical utility of large-scale, navigable prompt knowledge bases for generative AI applications.
📝 Abstract
Recent technological advances popularized the use of image generation among the general public. Crafting effective prompts can, however, be difficult for novice users. To tackle this challenge, we developed PromptMap, a new interaction style for text-to-image AI that allows users to freely explore a vast collection of synthetic prompts through a map-like view with semantic zoom. PromptMap groups images visually by their semantic similarity, allowing users to discover relevant examples. We evaluated PromptMap in a between-subject online study ($n=60$) and a qualitative within-subject study ($n=12$). We found that PromptMap supported users in crafting prompts by providing them with examples. We also demonstrated the feasibility of using LLMs to create vast example collections. Our work contributes a new interaction style that supports users unfamiliar with prompting in achieving a satisfactory image output.