DETONATE: A Benchmark for Text-to-Image Alignment and Kernelized Direct Preference Optimization

📅 2025-06-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the lack of systematic alignment evaluation and multidimensional optimization for text-to-image (T2I) models, this paper introduces DETONATE—the first large-scale, fine-grained alignment benchmark comprising 100K image-text pairs—focused on racial, gender, and disability-related societal biases, and comprehensively assessing intent fidelity, safety, and fairness. We propose DPO-Kernels: a novel framework featuring an embedding-probability hybrid loss, RBF/polynomial/wavelet kernelized representations, and Wasserstein/Rényi divergences as KL-regularization substitutes, alongside a geometric Alignment Quality Index (AQI) quantifying latent-space separability. We further introduce Heavy-Tailed Self-Regularization (HT-SR) to theoretically guarantee generalization bounds. Extensive evaluation on SDXL, SD3.5L, and Midjourney demonstrates significant improvements in robustness and fairness; AQI effectively exposes latent safety vulnerabilities. Both code and dataset are fully open-sourced.

Technology Category

Application Category

📝 Abstract
Alignment is crucial for text-to-image (T2I) models to ensure that generated images faithfully capture user intent while maintaining safety and fairness. Direct Preference Optimization (DPO), prominent in large language models (LLMs), is extending its influence to T2I systems. This paper introduces DPO-Kernels for T2I models, a novel extension enhancing alignment across three dimensions: (i) Hybrid Loss, integrating embedding-based objectives with traditional probability-based loss for improved optimization; (ii) Kernelized Representations, employing Radial Basis Function (RBF), Polynomial, and Wavelet kernels for richer feature transformations and better separation between safe and unsafe inputs; and (iii) Divergence Selection, expanding beyond DPO's default Kullback-Leibler (KL) regularizer by incorporating Wasserstein and R'enyi divergences for enhanced stability and robustness. We introduce DETONATE, the first large-scale benchmark of its kind, comprising approximately 100K curated image pairs categorized as chosen and rejected. DETONATE encapsulates three axes of social bias and discrimination: Race, Gender, and Disability. Prompts are sourced from hate speech datasets, with images generated by leading T2I models including Stable Diffusion 3.5 Large, Stable Diffusion XL, and Midjourney. Additionally, we propose the Alignment Quality Index (AQI), a novel geometric measure quantifying latent-space separability of safe/unsafe image activations, revealing hidden vulnerabilities. Empirically, we demonstrate that DPO-Kernels maintain strong generalization bounds via Heavy-Tailed Self-Regularization (HT-SR). DETONATE and complete code are publicly released.
Problem

Research questions and friction points this paper is trying to address.

Enhancing text-to-image alignment via DPO-Kernels for improved optimization.
Introducing DETONATE benchmark to evaluate social bias in generated images.
Proposing AQI to measure latent-space separability of safe and unsafe images.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hybrid Loss integrates embedding and probability objectives
Kernelized Representations use RBF, Polynomial, Wavelet kernels
Divergence Selection includes Wasserstein and R'enyi metrics
🔎 Similar Papers
No similar papers found.