Seeing Twice: How Side-by-Side T2I Comparison Changes Auditing Strategies

📅 2025-11-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Non-AI-expert stakeholders struggle to effectively audit biases and harmful outputs in text-to-image (T2I) models. Method: This paper introduces a “contrast-first” auditing paradigm and presents MIRAGE, an open-source web tool enabling side-by-side comparison of outputs from multiple T2I models within a unified interface. Leveraging structured parallel visualization, interactive feedback, and real-time response, MIRAGE guides users from isolated image evaluation toward systematic pattern recognition. Results: A user study (N=24) demonstrates that the approach significantly improves bias detection efficiency, fosters consistent understanding of model behavioral differences—such as linguistic fidelity and stereotypical representation—and enables participants to identify, on average, 3.2 additional categories of systemic bias overlooked in single-model assessments. The core contribution is the operationalization of contrastive cognition theory into a scalable, human-centered auditing framework for generative AI, advancing accessible and trustworthy AI governance for non-specialist stakeholders.

Technology Category

Application Category

📝 Abstract
While generative AI systems have gained popularity in diverse applications, their potential to produce harmful outputs limits their trustworthiness and utility. A small but growing line of research has explored tools and processes to better engage non-AI expert users in auditing generative AI systems. In this work, we present the design and evaluation of MIRAGE, a web-based tool exploring a "contrast-first" workflow that allows users to pick up to four different text-to-image (T2I) models, view their images side-by-side, and provide feedback on model performance on a single screen. In our user study with fifteen participants, we used four predefined models for consistency, with only a single model initially being shown. We found that most participants shifted from analyzing individual images to general model output patterns once the side-by-side step appeared with all four models; several participants coined persistent "model personalities" (e.g., cartoonish, saturated) that helped them form expectations about how each model would behave on future prompts. Bilingual participants also surfaced a language-fidelity gap, as English prompts produced more accurate images than Portuguese or Chinese, an issue often overlooked when dealing with a single model. These findings suggest that simple comparative interfaces can accelerate bias discovery and reshape how people think about generative models.
Problem

Research questions and friction points this paper is trying to address.

The study investigates how side-by-side T2I model comparison changes user auditing strategies
It addresses discovering model biases through comparative interfaces for non-expert users
It explores language fidelity gaps in multilingual text-to-image generation systems
Innovation

Methods, ideas, or system contributions that make the work stand out.

Side-by-side comparison of multiple T2I models
Contrast-first workflow for model auditing
Web tool enabling simultaneous four-model evaluation
🔎 Similar Papers
No similar papers found.