🤖 AI Summary
Designers struggle to effectively integrate text prompts, annotations, and scribbles—three distinct input modalities—during the image refinement phase in generative AI image tools.
Method: We conducted the first systematic comparative study with seven professional designers using a digital-paper prototype, combining contextual interviews and task-based behavioral observation to analyze input strategies, cognitive load, and AI misinterpretation patterns.
Results: We identify clear functional boundaries and complementary synergies among modalities: annotations excel at spatial referencing and element identification; scribbles support precise shape and positional control; text prompts best stimulate semantic creativity. Key bottlenecks include frequent AI misinterpretation of visual cues and high cognitive cost in crafting effective text prompts. Grounded in empirical evidence, we propose multimodal prompt design principles that balance expressivity, interpretability, and efficiency—offering both theoretical foundations and practical guidelines for next-generation GenAI design tool interaction paradigms.
📝 Abstract
Generative AI (GenAI) tools are increasingly integrated into design workflows. While text prompts remain the primary input method for GenAI image tools, designers often struggle to craft effective ones. Moreover, research has primarily focused on input methods for ideation, with limited attention to refinement tasks. This study explores designers' preferences for three input methods - text prompts, annotations, and scribbles - through a preliminary digital paper-based study with seven professional designers. Designers preferred annotations for spatial adjustments and referencing in-image elements, while scribbles were favored for specifying attributes such as shape, size, and position, often combined with other methods. Text prompts excelled at providing detailed descriptions or when designers sought greater GenAI creativity. However, designers expressed concerns about AI misinterpreting annotations and scribbles and the effort needed to create effective text prompts. These insights inform GenAI interface design to better support refinement tasks, align with workflows, and enhance communication with AI systems.