Diffusion Probe: Generated Image Result Prediction Using CNN Probes

📅 2026-02-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the lack of efficient early-stage quality assessment in existing text-to-image diffusion models, which often leads to wasted computational resources during iterative generation. The study reveals, for the first time, a strong correlation between the cross-attention distributions in early denoising steps and the final image quality. Building on this insight, the authors propose a lightweight, model-agnostic, and generalizable framework for early quality prediction: by extracting statistical features from cross-attention maps and feeding them into a compact CNN probe, the method accurately forecasts the eventual image fidelity. Evaluated across multiple text-to-image models and quality metrics, the approach achieves consistently strong performance (PCC > 0.7, AUC-ROC > 0.9), significantly enhancing the efficiency and output quality of downstream tasks such as prompt refinement and seed selection.

Technology Category

Application Category

📝 Abstract
Text-to-image (T2I) diffusion models lack an efficient mechanism for early quality assessment, leading to costly trial-and-error in multi-generation scenarios such as prompt iteration, agent-based generation, and flow-grpo. We reveal a strong correlation between early diffusion cross-attention distributions and final image quality. Based on this finding, we introduce Diffusion Probe, a framework that leverages internal cross-attention maps as predictive signals. We design a lightweight predictor that maps statistical properties of early-stage cross-attention extracted from initial denoising steps to the final image's overall quality. This enables accurate forecasting of image quality across diverse evaluation metrics long before full synthesis is complete. We validate Diffusion Probe across a wide range of settings. On multiple T2I models, across early denoising windows, resolutions, and quality metrics, it achieves strong correlation (PCC>0.7) and high classification performance (AUC-ROC>0.9). Its reliability translates into practical gains. By enabling early quality-aware decisions in workflows such as prompt optimization, seed selection, and accelerated RL training, the probe supports more targeted sampling and avoids computation on low-potential generations. This reduces computational overhead while improving final output quality. Diffusion Probe is model-agnostic, efficient, and broadly applicable, offering a practical solution for improving T2I generation efficiency through early quality prediction.
Problem

Research questions and friction points this paper is trying to address.

text-to-image diffusion
early quality assessment
computational efficiency
multi-generation scenarios
image quality prediction
Innovation

Methods, ideas, or system contributions that make the work stand out.

Diffusion Probe
cross-attention
early quality prediction
text-to-image generation
computational efficiency
🔎 Similar Papers
No similar papers found.
B
Benlei Cui
Alibaba Group
B
Bukun Huang
Laboratory for Statistical Monitoring and Intelligent Governance of Common Prosperity, School of Statistics and Data Science, Zhejiang Gongshang University
Z
Zhizeng Ye
Laboratory for Statistical Monitoring and Intelligent Governance of Common Prosperity, School of Statistics and Data Science, Zhejiang Gongshang University
X
Xuemei Dong
Laboratory for Statistical Monitoring and Intelligent Governance of Common Prosperity, School of Statistics and Data Science, Zhejiang Gongshang University
T
Tuo Chen
Southeast University
H
Hui Xue
Alibaba Group
Dingkang Yang
Dingkang Yang
ByteDance
Multimodal LearningGenerative AIEmbodied AI
Longtao Huang
Longtao Huang
Alibaba Group
Knowledge GraphService ComputingData Mining
Jingqun Tang
Jingqun Tang
ByteDance Inc.
Computer VisionDocument IntelligenceMLLMMultimodal Generative Models
H
Haiwen Hong
Alibaba Group