TDRI: Two-Phase Dialogue Refinement and Co-Adaptation for Interactive Image Generation

πŸ“… 2025-03-22
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
To address ambiguous user intent and misaligned feedback in text-to-image generation, this paper proposes a two-stage interactive framework: an initial image generation stage followed by an iterative refinement stage comprising Dialogue-to-Prompt (D2P) conversion, Feedback-driven Reflection (FR), and Adaptive Optimization (AO) of generation parameters. We introduce the first dialogue-driven co-optimization mechanism that dynamically models user intent and enables real-time adjustment of the generation process, preserving prompt fidelity while substantially improving personalized alignment. Experiments demonstrate strong semantic alignment, with CLIP and BLIP scores of 0.338 and 0.336, respectively; a human preference win rate of 33.6%β€”a 27.4-percentage-point improvement over a GPT-4–enhanced baseline; 88% user satisfaction after eight iterations; and a 40% reduction in required iterations for fashion design tasks.

Technology Category

Application Category

πŸ“ Abstract
Although text-to-image generation technologies have made significant advancements, they still face challenges when dealing with ambiguous prompts and aligning outputs with user intent.Our proposed framework, TDRI (Two-Phase Dialogue Refinement and Co-Adaptation), addresses these issues by enhancing image generation through iterative user interaction. It consists of two phases: the Initial Generation Phase, which creates base images based on user prompts, and the Interactive Refinement Phase, which integrates user feedback through three key modules. The Dialogue-to-Prompt (D2P) module ensures that user feedback is effectively transformed into actionable prompts, which improves the alignment between user intent and model input. By evaluating generated outputs against user expectations, the Feedback-Reflection (FR) module identifies discrepancies and facilitates improvements. In an effort to ensure consistently high-quality results, the Adaptive Optimization (AO) module fine-tunes the generation process by balancing user preferences and maintaining prompt fidelity. Experimental results show that TDRI outperforms existing methods by achieving 33.6% human preference, compared to 6.2% for GPT-4 augmentation, and the highest CLIP and BLIP alignment scores (0.338 and 0.336, respectively). In iterative feedback tasks, user satisfaction increased to 88% after 8 rounds, with diminishing returns beyond 6 rounds. Furthermore, TDRI has been found to reduce the number of iterations and improve personalization in the creation of fashion products. TDRI exhibits a strong potential for a wide range of applications in the creative and industrial domains, as it streamlines the creative process and improves alignment with user preferences
Problem

Research questions and friction points this paper is trying to address.

Enhances image generation with iterative user interaction
Improves alignment between user intent and model output
Reduces iterations and boosts personalization in fashion design
Innovation

Methods, ideas, or system contributions that make the work stand out.

Two-phase dialogue refinement for image generation
Dialogue-to-Prompt module enhances user feedback
Adaptive Optimization balances preferences and fidelity
πŸ”Ž Similar Papers
No similar papers found.
Y
Yuheng Feng
Xidian University, No. 2 Taibai South Road, Xi’an, 710071, Shaanxi, China
Jianhui Wang
Jianhui Wang
University of Electronic Science and Technology of China, Qingshuihe Campus, 2006 Xiyuan Ave, West Hi-Tech Zone, Chengdu, 611731, Sichuan, China
K
Kun Li
Xiamen University Malaysia, Jalan Sunsuria, Bandar Sunsuria, Sepang, 43900, Selangor, Malaysia
Sida Li
Sida Li
Undergraduate, Peking University
Multimodal LLMStable diffusion
Tianyu Shi
Tianyu Shi
University of Toronto
Reinforcement learningIntelligent Transportation SystemLarge Language ModelsAILLM agent
H
Haoyue Han
Shenzhen International Graduate School, Tsinghua University, University Town of Shenzhen, Nanshan District, Shenzhen, 518055, Guangdong, China
M
Miao Zhang
Shenzhen International Graduate School, Tsinghua University, University Town of Shenzhen, Nanshan District, Shenzhen, 518055, Guangdong, China
Xueqian Wang
Xueqian Wang
Tsinghua University
Information FusionTarget DetectionRadar ImagingImage Processing