Say My Name: a Model's Bias Discovery Framework

📅 2024-08-18

🏛️ arXiv.org

📈 Citations: 4

✨ Influential: 0

career value

189K/year

🤖 AI Summary

Deep learning models often inherit implicit biases, and existing unsupervised debiasing methods rely on latent-space clustering to generate pseudo-labels—lacking semantic interpretability and hindering expert validation. To address this, we propose the first semantics-driven framework for diagnosing implicit bias: it requires neither human annotations nor prior assumptions about bias types. Our approach leverages text-guided latent-space disentanglement, task-relevance distillation, bias-semantic mapping, and a generative explanation module to automatically identify and semantically name non-representative bias features actually learned by the model. This enables explicit, interpretable bias identification and attribution, supporting both in-training intervention and post-hoc verification. Evaluated across multiple benchmarks, our method significantly improves bias detection accuracy and interpretability, demonstrating strong generalizability and practical operability.

Technology Category

Application Category

📝 Abstract

In the last few years, due to the broad applicability of deep learning to downstream tasks and end-to-end training capabilities, increasingly more concerns about potential biases to specific, non-representative patterns have been raised. Many works focusing on unsupervised debiasing usually leverage the tendency of deep models to learn ``easier'' samples, for example by clustering the latent space to obtain bias pseudo-labels. However, the interpretation of such pseudo-labels is not trivial, especially for a non-expert end user, as it does not provide semantic information about the bias features. To address this issue, we introduce ``Say My Name'' (SaMyNa), the first tool to identify biases within deep models semantically. Unlike existing methods, our approach focuses on biases learned by the model. Our text-based pipeline enhances explainability and supports debiasing efforts: applicable during either training or post-hoc validation, our method can disentangle task-related information and proposes itself as a tool to analyze biases. Evaluation on traditional benchmarks demonstrates its effectiveness in detecting biases and even disclaiming them, showcasing its broad applicability for model diagnosis.

Problem

Research questions and friction points this paper is trying to address.

Detects semantic biases in deep learning models automatically

Explains model biases with interpretable text-based descriptions

Supports debiasing during both training and post-hoc validation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Semantic bias identification in deep models

Text-based pipeline enhances explainability

Disentangles task-related information for debiasing

🔎 Similar Papers

LangBiTe: A Platform for Testing Bias in Large Language Models