HInter: Exposing Hidden Intersectional Bias in Large Language Models

📅 2025-03-15

📈 Citations: 0

✨ Influential: 0

career value

188K/year

🤖 AI Summary

This paper addresses the challenge of detecting latent intersectional bias—arising from multiple protected attributes (e.g., race × gender)—in large language models (LLMs). We propose HInter, an automated testing framework that integrates mutation analysis, dependency parsing, and metamorphic testing. It constructs syntax-aware input invariants to suppress false positives and validates bias via consistency checking across LLM responses. Our key contributions are threefold: (1) We systematically uncover a previously overlooked “hidden” intersectional bias affecting 16.62% of cases—undetectable under single-attribute evaluation; (2) We reduce the false positive rate by an order of magnitude; and (3) Across six architectural families and 18 mainstream models—including GPT-3.5, Llama-2, and BERT—HInter achieves a 14.61% intersectional bias detection rate, significantly improving test coverage and reliability.

Technology Category

Application Category

📝 Abstract

Large Language Models (LLMs) may portray discrimination towards certain individuals, especially those characterized by multiple attributes (aka intersectional bias). Discovering intersectional bias in LLMs is challenging, as it involves complex inputs on multiple attributes (e.g. race and gender). To address this challenge, we propose HInter, a test technique that synergistically combines mutation analysis, dependency parsing and metamorphic oracles to automatically detect intersectional bias in LLMs. HInter generates test inputs by systematically mutating sentences using multiple mutations, validates inputs via a dependency invariant and detects biases by checking the LLM response on the original and mutated sentences. We evaluate HInter using six LLM architectures and 18 LLM models (GPT3.5, Llama2, BERT, etc) and find that 14.61% of the inputs generated by HInter expose intersectional bias. Results also show that our dependency invariant reduces false positives (incorrect test inputs) by an order of magnitude. Finally, we observed that 16.62% of intersectional bias errors are hidden, meaning that their corresponding atomic cases do not trigger biases. Overall, this work emphasize the importance of testing LLMs for intersectional bias.

Problem

Research questions and friction points this paper is trying to address.

Detects hidden intersectional bias in Large Language Models (LLMs).

Combines mutation analysis, dependency parsing, and metamorphic oracles.

Reduces false positives and identifies hidden bias errors in LLMs.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Combines mutation analysis, dependency parsing, metamorphic oracles

Systematically mutates sentences to generate test inputs

Uses dependency invariant to reduce false positives

🔎 Similar Papers

From Prejudice to Parity: A New Approach to Debiasing Large Language Model Word Embeddings