A Picture is Worth a Thousand Prompts? Efficacy of Iterative Human-Driven Prompt Refinement in Image Regeneration Tasks

📅 2025-04-29

📈 Citations: 0

✨ Influential: 0

career value

198K/year

🤖 AI Summary

This study investigates whether humans can accurately regenerate target images via iterative prompt optimization in AI-based image generation, and systematically evaluates the reliability of mainstream image similarity metrics (LPIPS, CLIP-Score, DINOv2) throughout this process—particularly their alignment with human perceptual judgments. Method: Combining controlled user studies, subjective similarity ratings, and rigorous statistical analysis, the work quantifies how progressive manual prompt refinement affects regeneration fidelity. Contribution/Results: It provides the first empirical validation that iterative human-in-the-loop prompt engineering significantly improves image regeneration quality—both subjectively and across most objective metrics. CLIP-Score demonstrates strong correlation with human judgments (r > 0.85), whereas LPIPS shows notably weaker alignment. The study establishes the first empirically grounded benchmark for human–machine similarity alignment, offering both methodological foundations and evaluation standards for interpretable AI image editing and human–AI collaborative prompt engineering.

Technology Category

Application Category

📝 Abstract

With AI-generated content becoming ubiquitous across the web, social media, and other digital platforms, it is vital to examine how such content are inspired and generated. The creation of AI-generated images often involves refining the input prompt iteratively to achieve desired visual outcomes. This study focuses on the relatively underexplored concept of image regeneration using AI, in which a human operator attempts to closely recreate a specific target image by iteratively refining their prompt. Image regeneration is distinct from normal image generation, which lacks any predefined visual reference. A separate challenge lies in determining whether existing image similarity metrics (ISMs) can provide reliable, objective feedback in iterative workflows, given that we do not fully understand if subjective human judgments of similarity align with these metrics. Consequently, we must first validate their alignment with human perception before assessing their potential as a feedback mechanism in the iterative prompt refinement process. To address these research gaps, we present a structured user study evaluating how iterative prompt refinement affects the similarity of regenerated images relative to their targets, while also examining whether ISMs capture the same improvements perceived by human observers. Our findings suggest that incremental prompt adjustments substantially improve alignment, verified through both subjective evaluations and quantitative measures, underscoring the broader potential of iterative workflows to enhance generative AI content creation across various application domains.

Problem

Research questions and friction points this paper is trying to address.

Evaluates iterative human-driven prompt refinement for AI image regeneration

Assesses alignment of image similarity metrics with human perception

Examines effectiveness of incremental prompt adjustments in improving output

Innovation

Methods, ideas, or system contributions that make the work stand out.

Iterative human-driven prompt refinement for images

Validation of image similarity metrics with human perception

Structured user study on prompt refinement efficacy

🔎 Similar Papers

DreamBench++: A Human-Aligned Benchmark for Personalized Image Generation