Trade-offs in Image Generation: How Do Different Dimensions Interact?

📅 2025-07-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current text-to-image (T2I) and image-to-image (I2I) generative models lack fine-grained, quantitative tools to characterize multidimensional performance trade-offs—namely, quality, alignment, diversity, and robustness—due to overreliance on single-metric evaluation and the absence of comprehensive benchmarks. To address this, we propose TRIG-Bench, a large-scale evaluation benchmark comprising 40,200 samples, and TRIGScore, the first adaptive metric enabling joint assessment across 132 pairwise dimensional combinations. TRIGScore leverages vision-language models (VLMs-as-judges) and a novel relational inference system. We further introduce the Dimensional Trade-off Map (DTM), a visualization framework that quantifies model capability boundaries and guides targeted fine-tuning. Extensive evaluation across 14 state-of-the-art models demonstrates that DTM accurately characterizes performance limitations and effectively improves both weak-dimensional performance and holistic model capability.

Technology Category

Application Category

📝 Abstract
Model performance in text-to-image (T2I) and image-to-image (I2I) generation often depends on multiple aspects, including quality, alignment, diversity, and robustness. However, models' complex trade-offs among these dimensions have rarely been explored due to (1) the lack of datasets that allow fine-grained quantification of these trade-offs, and (2) the use of a single metric for multiple dimensions. To bridge this gap, we introduce TRIG-Bench (Trade-offs in Image Generation), which spans 10 dimensions (Realism, Originality, Aesthetics, Content, Relation, Style, Knowledge, Ambiguity, Toxicity, and Bias), contains 40,200 samples, and covers 132 pairwise dimensional subsets. Furthermore, we develop TRIGScore, a VLM-as-judge metric that automatically adapts to various dimensions. Based on TRIG-Bench and TRIGScore, we evaluate 14 models across T2I and I2I tasks. In addition, we propose the Relation Recognition System to generate the Dimension Trade-off Map (DTM) that visualizes the trade-offs among model-specific capabilities. Our experiments demonstrate that DTM consistently provides a comprehensive understanding of the trade-offs between dimensions for each type of generative model. Notably, we show that the model's dimension-specific weaknesses can be mitigated through fine-tuning on DTM to enhance overall performance. Code is available at: https://github.com/fesvhtr/TRIG
Problem

Research questions and friction points this paper is trying to address.

Explores trade-offs among image generation dimensions like quality and diversity
Introduces TRIG-Bench dataset to quantify multi-dimensional trade-offs in image generation
Proposes TRIGScore metric and DTM to visualize and mitigate model weaknesses
Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces TRIG-Bench for multi-dimensional trade-offs analysis
Develops TRIGScore, a VLM-based adaptive evaluation metric
Proposes Relation Recognition System for Dimension Trade-off Map
🔎 Similar Papers
No similar papers found.