Vipera: Blending Visual and LLM-Driven Guidance for Systematic Auditing of Text-to-Image Generative AI

📅 2025-10-07

📈 Citations: 0

✨ Influential: 0

career value

195K/year

🤖 AI Summary

Text-to-image generative models frequently produce biased or harmful content, yet existing auditing methods lack structured approaches to explore their vast output spaces. Method: We propose VisAudit, an interactive auditing framework that integrates visual cues—specifically scene graphs—with large language models (LLMs). It hierarchically models auditing criteria, generates LLM-guided suggestions grounded in visual semantics, and provides a collaborative analysis interface supporting image understanding and navigable exploration. Contribution/Results: In a user study with 24 AI auditing experts, VisAudit significantly improved auditing criterion organization efficiency, uncovered previously neglected bias dimensions, and enhanced navigability across large-scale model outputs. By unifying visual reasoning and LLM-based guidance within a human-in-the-loop interface, VisAudit establishes a scalable, interpretable, and structured paradigm for safety assessment of generative AI systems.

Technology Category

Application Category

📝 Abstract

Despite their increasing capabilities, text-to-image generative AI systems are known to produce biased, offensive, and otherwise problematic outputs. While recent advancements have supported testing and auditing of generative AI, existing auditing methods still face challenges in supporting effectively explore the vast space of AI-generated outputs in a structured way. To address this gap, we conducted formative studies with five AI auditors and synthesized five design goals for supporting systematic AI audits. Based on these insights, we developed Vipera, an interactive auditing interface that employs multiple visual cues including a scene graph to facilitate image sensemaking and inspire auditors to explore and hierarchically organize the auditing criteria. Additionally, Vipera leverages LLM-powered suggestions to facilitate exploration of unexplored auditing directions. Through a controlled experiment with 24 participants experienced in AI auditing, we demonstrate Vipera's effectiveness in helping auditors navigate large AI output spaces and organize their analyses while engaging with diverse criteria.

Problem

Research questions and friction points this paper is trying to address.

Auditing text-to-image AI for biased outputs systematically

Exploring vast AI-generated content space with structured guidance

Enhancing auditor analysis with visual and LLM-driven suggestions

Innovation

Methods, ideas, or system contributions that make the work stand out.

Scene graph visual cues for hierarchical image analysis

LLM-powered suggestions to explore new audit directions

Interactive interface blending visual and language guidance

🔎 Similar Papers

Is What You Ask For What You Get? Investigating Concept Associations in Text-to-Image Models