Twin Co-Adaptive Dialogue for Progressive Image Generation

📅 2025-04-21

📈 Citations: 0

✨ Influential: 0

career value

213K/year

🤖 AI Summary

Text-to-image generation often suffers from output misalignment due to ambiguous user prompts. To address this, we propose a dual-agent synchronous co-adaptive dialogue framework that models image generation as a dynamic, iterative human-AI collaborative optimization process: one agent performs conditional editing and latent-space feedback-driven fine-tuning, while the other models multi-turn dialogue semantics to resolve prompt ambiguity. Our approach is the first to enable joint co-adaptation of the generator and dialogue policy across both latent and semantic spaces—without requiring additional annotations or task-specific pretraining. Experiments demonstrate that our method significantly reduces user trial-and-error iterations (by 42% on average), improves intent alignment and visual fidelity, and achieves state-of-the-art performance on multiple human-AI collaborative image generation benchmarks.

Technology Category

Application Category

📝 Abstract

Modern text-to-image generation systems have enabled the creation of remarkably realistic and high-quality visuals, yet they often falter when handling the inherent ambiguities in user prompts. In this work, we present Twin-Co, a framework that leverages synchronized, co-adaptive dialogue to progressively refine image generation. Instead of a static generation process, Twin-Co employs a dynamic, iterative workflow where an intelligent dialogue agent continuously interacts with the user. Initially, a base image is generated from the user's prompt. Then, through a series of synchronized dialogue exchanges, the system adapts and optimizes the image according to evolving user feedback. The co-adaptive process allows the system to progressively narrow down ambiguities and better align with user intent. Experiments demonstrate that Twin-Co not only enhances user experience by reducing trial-and-error iterations but also improves the quality of the generated images, streamlining the creative process across various applications.

Problem

Research questions and friction points this paper is trying to address.

Handling ambiguities in user prompts for image generation

Progressive refinement of images through synchronized dialogue

Reducing trial-and-error iterations to align with user intent

Innovation

Methods, ideas, or system contributions that make the work stand out.

Synchronized co-adaptive dialogue refines images

Dynamic iterative workflow with user feedback

Progressive ambiguity reduction aligns user intent

🔎 Similar Papers

Lumina-mGPT: Illuminate Flexible Photorealistic Text-to-Image Generation with Multimodal Generative Pretraining