🤖 AI Summary
This work addresses the limitations of current text-to-image prompting methods when users struggle to articulate their visual intent due to ambiguous instructions or unfamiliarity with model capabilities. To overcome this, the authors propose Adaptive Prompt Elicitation (APE), a novel approach that integrates language model priors with an information-theoretic framework to dynamically generate interpretable visual queries. These queries guide users in iteratively refining their intentions while automatically compiling high-quality prompts. By moving beyond conventional text-only prompting paradigms, APE significantly improves alignment between user intent and generated outputs, as demonstrated on the IDEA-Bench and DesignBench benchmarks. User studies further reveal a 19.8% improvement in intent alignment for complex tasks without imposing additional cognitive load.
📝 Abstract
Aligning text-to-image generation with user intent remains challenging, for users who provide ambiguous inputs and struggle with model idiosyncrasies. We propose Adaptive Prompt Elicitation (APE), a technique that adaptively asks visual queries to help users refine prompts without extensive writing. Our technical contribution is a formulation of interactive intent inference under an information-theoretic framework. APE represents latent intent as interpretable feature requirements using language model priors, adaptively generates visual queries, and compiles elicited requirements into effective prompts. Evaluation on IDEA-Bench and DesignBench shows that APE achieves stronger alignment with improved efficiency. A user study with challenging user-defined tasks demonstrates 19.8% higher alignment without workload overhead. Our work contributes a principled approach to prompting that, for general users, offers an effective and efficient complement to the prevailing prompt-based interaction paradigm with text-to-image models.