🤖 AI Summary
Identity bias—manifesting as either sycophancy bias (uncritical adoption of peers’ views) or self-bias (excessive adherence to one’s own stance)—pervasively undermines reasoning reliability in Multi-Agent Debate (MAD). Method: We propose the first unified modeling framework for both biases, introducing an identity-weighted Bayesian updating mechanism and a quantifiable Identity Bias Coefficient (IBC). To mitigate bias, we anonymize agent responses, removing identity cues and compelling agents to evaluate content rather than source. Our approach integrates de-identified prompting design, IBC measurement, and a multi-round debate experimental protocol. Contribution/Results: Empirical evaluation reveals sycophancy bias is significantly stronger than self-bias. Response anonymization reduces average IBC by 37.2% and improves reasoning consistency by 21.8%, substantially enhancing the robustness and credibility of MAD systems.
📝 Abstract
Multi-agent debate (MAD) aims to improve large language model (LLM) reasoning by letting multiple agents exchange answers and then aggregate their opinions. Yet recent studies reveal that agents are not neutral: they are prone to identity-driven sycophancy and self-bias, uncritically adopting a peer's view or stubbornly adhering to their own prior output, undermining the reliability of debate. In this work, we present the first principled framework that joins sycophancy and self-bias to mitigate and quantify identity bias in MAD. First, we formalize the debate dynamics as an identity-weighted Bayesian update process. Second, we propose response anonymization: by removing identity markers from prompts, agents cannot distinguish "self" from "peer", which forces equal weights on agent identity, thereby reducing bias. Third, we define the Identity Bias Coefficient (IBC), a principled metric that measures how often an agent follows a peer versus itself. Empirical studies across multiple models, datasets and debate rounds confirm that identity bias is widespread, with sycophancy far more common than self-bias. Our findings highlight the need to "mask" identity to ensure that MAD systems reason based on content rather than source identity. Code is released in https://github.com/deeplearning-wisc/MAD-identity-bias.