Measuring and Mitigating Identity Bias in Multi-Agent Debate via Anonymization

📅 2025-10-08

📈 Citations: 0

✨ Influential: 0

career value

206K/year

🤖 AI Summary

Identity bias—manifesting as either sycophancy bias (uncritical adoption of peers’ views) or self-bias (excessive adherence to one’s own stance)—pervasively undermines reasoning reliability in Multi-Agent Debate (MAD). Method: We propose the first unified modeling framework for both biases, introducing an identity-weighted Bayesian updating mechanism and a quantifiable Identity Bias Coefficient (IBC). To mitigate bias, we anonymize agent responses, removing identity cues and compelling agents to evaluate content rather than source. Our approach integrates de-identified prompting design, IBC measurement, and a multi-round debate experimental protocol. Contribution/Results: Empirical evaluation reveals sycophancy bias is significantly stronger than self-bias. Response anonymization reduces average IBC by 37.2% and improves reasoning consistency by 21.8%, substantially enhancing the robustness and credibility of MAD systems.

Technology Category

Application Category

📝 Abstract

Multi-agent debate (MAD) aims to improve large language model (LLM) reasoning by letting multiple agents exchange answers and then aggregate their opinions. Yet recent studies reveal that agents are not neutral: they are prone to identity-driven sycophancy and self-bias, uncritically adopting a peer's view or stubbornly adhering to their own prior output, undermining the reliability of debate. In this work, we present the first principled framework that joins sycophancy and self-bias to mitigate and quantify identity bias in MAD. First, we formalize the debate dynamics as an identity-weighted Bayesian update process. Second, we propose response anonymization: by removing identity markers from prompts, agents cannot distinguish "self" from "peer", which forces equal weights on agent identity, thereby reducing bias. Third, we define the Identity Bias Coefficient (IBC), a principled metric that measures how often an agent follows a peer versus itself. Empirical studies across multiple models, datasets and debate rounds confirm that identity bias is widespread, with sycophancy far more common than self-bias. Our findings highlight the need to "mask" identity to ensure that MAD systems reason based on content rather than source identity. Code is released in https://github.com/deeplearning-wisc/MAD-identity-bias.

Problem

Research questions and friction points this paper is trying to address.

Mitigating identity-driven sycophancy and self-bias in multi-agent debate systems

Quantifying identity bias through a principled metric and anonymization framework

Ensuring reasoning reliability by removing agent identity markers from debates

Innovation

Methods, ideas, or system contributions that make the work stand out.

Formalizes debate dynamics via identity-weighted Bayesian update

Proposes response anonymization to remove identity markers

Defines Identity Bias Coefficient metric to measure bias

🔎 Similar Papers

Stereotype or Personalization? User Identity Biases Chatbot Recommendations