POET: Supporting Prompting Creativity and Personalization with Automated Expansion of Text-to-Image Generation

📅 2025-04-18

📈 Citations: 0

✨ Influential: 0

career value

173K/year

🤖 AI Summary

Current text-to-image models suffer from severe output homogenization and high interaction barriers, hindering personalized exploration for early-stage creative users. To address this, we propose a model-driven framework for automatic discovery and learnable expansion of implicit semantic dimensions, which—uniquely—embeds real-time user feedback directly into the generative space evolution of diffusion models. Our approach integrates CLIP-based feature analysis, contrastive learning for semantic disentanglement, online reinforcement feedback modeling, and conditional control mechanisms to enable dynamic, value-aligned, co-creative prompt exploration. Evaluated across four creative tasks, our method significantly enhances perceived output diversity, reduces the average number of prompt iterations required to reach user satisfaction by 37%, and increases reflective exploration behaviors by 2.1×.

Technology Category

Application Category

📝 Abstract

State-of-the-art visual generative AI tools hold immense potential to assist users in the early ideation stages of creative tasks -- offering the ability to generate (rather than search for) novel and unprecedented (instead of existing) images of considerable quality that also adhere to boundless combinations of user specifications. However, many large-scale text-to-image systems are designed for broad applicability, yielding conventional output that may limit creative exploration. They also employ interaction methods that may be difficult for beginners. Given that creative end users often operate in diverse, context-specific ways that are often unpredictable, more variation and personalization are necessary. We introduce POET, a real-time interactive tool that (1) automatically discovers dimensions of homogeneity in text-to-image generative models, (2) expands these dimensions to diversify the output space of generated images, and (3) learns from user feedback to personalize expansions. An evaluation with 28 users spanning four creative task domains demonstrated POET's ability to generate results with higher perceived diversity and help users reach satisfaction in fewer prompts during creative tasks, thereby prompting them to deliberate and reflect more on a wider range of possible produced results during the co-creative process. Focusing on visual creativity, POET offers a first glimpse of how interaction techniques of future text-to-image generation tools may support and align with more pluralistic values and the needs of end users during the ideation stages of their work.

Problem

Research questions and friction points this paper is trying to address.

Enhancing creativity in text-to-image generation outputs

Improving personalization through user feedback learning

Simplifying interaction for beginners in generative AI tools

Innovation

Methods, ideas, or system contributions that make the work stand out.

Automatically discovers homogeneity dimensions in models

Expands dimensions to diversify image outputs

Learns from feedback to personalize expansions

🔎 Similar Papers

No similar papers found.