Tell Me You're Biased Without Telling Me You're Biased -- Toward Revealing Implicit Biases in Medical LLMs

📅 2025-07-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the challenge of detecting latent biases in large language models (LLMs) deployed in healthcare—a critical yet underexplored fairness concern. We propose the first multi-hop adversarial probing framework that integrates medical knowledge graphs (KGs) with auxiliary LLMs. Our method constructs a structured, semantically grounded KG to encode domain-specific relationships, applies targeted adversarial perturbations to expose hidden biases, and leverages multi-hop reasoning jointly with an auxiliary LLM to identify subtle, cross-entity and cross-attribute bias patterns. Evaluated across three medical benchmarks, six state-of-the-art LLMs, and five bias categories (e.g., gender, race, geography), our approach significantly outperforms existing baselines. It achieves substantial improvements in bias detection rate, interpretability, and cross-model generalizability—establishing a scalable, automated paradigm for fairness assessment in medical AI.

Technology Category

Application Category

📝 Abstract
Large language models (LLMs) that are used in medical applications are known to show biased and unfair patterns. Prior to adopting these in clinical decision-making applications, it is crucial to identify these bias patterns to enable effective mitigation of their impact. In this study, we present a novel framework combining knowledge graphs (KGs) with auxiliary LLMs to systematically reveal complex bias patterns in medical LLMs. Specifically, the proposed approach integrates adversarial perturbation techniques to identify subtle bias patterns. The approach adopts a customized multi-hop characterization of KGs to enhance the systematic evaluation of arbitrary LLMs. Through a series of comprehensive experiments (on three datasets, six LLMs, and five bias types), we show that our proposed framework has noticeably greater ability and scalability to reveal complex biased patterns of LLMs compared to other baselines.
Problem

Research questions and friction points this paper is trying to address.

Identify biased patterns in medical LLMs
Reveal complex biases using knowledge graphs
Enhance bias detection with adversarial techniques
Innovation

Methods, ideas, or system contributions that make the work stand out.

Combines knowledge graphs with auxiliary LLMs
Uses adversarial perturbation for subtle biases
Custom multi-hop KG characterization for evaluation
🔎 Similar Papers
No similar papers found.