DSH-Bench: A Difficulty- and Scenario-Aware Benchmark with Hierarchical Subject Taxonomy for Subject-Driven Text-to-Image Generation

📅 2026-03-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the lack of fine-grained, diagnostic evaluation benchmarks for subject-driven text-to-image generation models, which hinders accurate assessment of their performance across varying subject difficulties and prompt scenarios. To this end, the authors introduce a systematic benchmark comprising a hierarchical subject sampling mechanism and a joint taxonomy of subject difficulty and prompt context. They further propose a novel automatic metric, the Subject Identity Consistency Score (SICS), and validate its effectiveness through large-scale experiments. Evaluations across 19 state-of-the-art models demonstrate that SICS achieves a 9.4% higher correlation with human judgments compared to existing metrics, effectively uncovering latent deficiencies in subject fidelity and providing actionable insights for model diagnosis and future research directions.

Technology Category

Application Category

📝 Abstract
Significant progress has been achieved in subject-driven text-to-image (T2I) generation, which aims to synthesize new images depicting target subjects according to user instructions. However, evaluating these models remains a significant challenge. Existing benchmarks exhibit critical limitations: 1) insufficient diversity and comprehensiveness in subject images, 2) inadequate granularity in assessing model performance across different subject difficulty levels and prompt scenarios, and 3) a profound lack of actionable insights and diagnostic guidance for subsequent model refinement. To address these limitations, we propose DSH-Bench, a comprehensive benchmark that enables systematic multi-perspective analysis of subject-driven T2I models through four principal innovations: 1) a hierarchical taxonomy sampling mechanism ensuring comprehensive subject representation across 58 fine-grained categories, 2) an innovative classification scheme categorizing both subject difficulty level and prompt scenario for granular capability assessment, 3) a novel Subject Identity Consistency Score (SICS) metric demonstrating a 9.4\% higher correlation with human evaluation compared to existing measures in quantifying subject preservation, and 4) a comprehensive set of diagnostic insights derived from the benchmark, offering critical guidance for optimizing future model training paradigms and data construction strategies. Through an extensive empirical evaluation of 19 leading models, DSH-Bench uncovers previously obscured limitations in current approaches, establishing concrete directions for future research and development.
Problem

Research questions and friction points this paper is trying to address.

subject-driven text-to-image generation
benchmark
evaluation
subject identity consistency
model diagnostics
Innovation

Methods, ideas, or system contributions that make the work stand out.

subject-driven text-to-image generation
hierarchical taxonomy
difficulty-aware evaluation
Subject Identity Consistency Score
diagnostic benchmark
🔎 Similar Papers
No similar papers found.
Zhenyu Hu
Zhenyu Hu
National University of Singapore
Revenue Management and Dynamic PricingSupply Chain ManagementDynamic ProgrammingGame Theory
Qing Wang
Qing Wang
IBM Research China
computer visionstatistical signal processingmobile communication
T
Te Cao
Tencent
L
Luo Liao
Tencent
L
Longfei Lu
Tencent
L
Liqun Liu
Tencent
Shuang Li
Shuang Li
Tencent; Tsinghua University
Natural language processingLarge Language ModelsText to Image/Video
H
Hang Chen
Tencent
M
Mengge Xue
Tencent
Y
Yuan Chen
Tencent
C
Chao Deng
Tencent
P
Peng Shu
Tencent
H
Huan Yu
Tencent
J
Jie Jiang
Tencent