Adaptive Prompt Elicitation for Text-to-Image Generation

📅 2026-02-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limitations of current text-to-image prompting methods when users struggle to articulate their visual intent due to ambiguous instructions or unfamiliarity with model capabilities. To overcome this, the authors propose Adaptive Prompt Elicitation (APE), a novel approach that integrates language model priors with an information-theoretic framework to dynamically generate interpretable visual queries. These queries guide users in iteratively refining their intentions while automatically compiling high-quality prompts. By moving beyond conventional text-only prompting paradigms, APE significantly improves alignment between user intent and generated outputs, as demonstrated on the IDEA-Bench and DesignBench benchmarks. User studies further reveal a 19.8% improvement in intent alignment for complex tasks without imposing additional cognitive load.

Technology Category

Application Category

📝 Abstract
Aligning text-to-image generation with user intent remains challenging, for users who provide ambiguous inputs and struggle with model idiosyncrasies. We propose Adaptive Prompt Elicitation (APE), a technique that adaptively asks visual queries to help users refine prompts without extensive writing. Our technical contribution is a formulation of interactive intent inference under an information-theoretic framework. APE represents latent intent as interpretable feature requirements using language model priors, adaptively generates visual queries, and compiles elicited requirements into effective prompts. Evaluation on IDEA-Bench and DesignBench shows that APE achieves stronger alignment with improved efficiency. A user study with challenging user-defined tasks demonstrates 19.8% higher alignment without workload overhead. Our work contributes a principled approach to prompting that, for general users, offers an effective and efficient complement to the prevailing prompt-based interaction paradigm with text-to-image models.
Problem

Research questions and friction points this paper is trying to address.

text-to-image generation
user intent alignment
ambiguous prompts
interactive prompting
prompt refinement
Innovation

Methods, ideas, or system contributions that make the work stand out.

Adaptive Prompt Elicitation
intent inference
information-theoretic framework
text-to-image generation
visual queries
🔎 Similar Papers
No similar papers found.