On Mitigating Affinity Bias through Bandits with Evolving Biased Feedback

๐Ÿ“… 2025-03-07
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This paper addresses dynamic unfairness in hiring arising from affinity bias: recruiters unconsciously favor candidates similar to themselves, and the bias intensity escalates with the proportion of like-minded individuals on the hiring committee, creating a self-reinforcing feedback loop that exacerbates inequality. The core challenge lies in making decisions based on true candidate ability while only observing biased, โ€œapparent valueโ€ signals. To this end, we formalize the problem for the first time as the *Affinity Bandit*, where observed rewards depend on historical actions and exhibit implicit, time-varying, policy-dependent bias. Theoretically, we derive the first instance-dependent regret lower bound. Algorithmically, we propose an adaptive elimination algorithm that operates without access to ground-truth reward labels and prove its regret is nearly tight with respect to the lower bound. Experiments demonstrate that our method significantly outperforms baselines such as UCB and effectively mitigates bias accumulation under simulated feedback loops.

Technology Category

Application Category

๐Ÿ“ Abstract
Unconscious bias has been shown to influence how we assess our peers, with consequences for hiring, promotions and admissions. In this work, we focus on affinity bias, the component of unconscious bias which leads us to prefer people who are similar to us, despite no deliberate intention of favoritism. In a world where the people hired today become part of the hiring committee of tomorrow, we are particularly interested in understanding (and mitigating) how affinity bias affects this feedback loop. This problem has two distinctive features: 1) we only observe the biased value of a candidate, but we want to optimize with respect to their real value 2) the bias towards a candidate with a specific set of traits depends on the fraction of people in the hiring committee with the same set of traits. We introduce a new bandits variant that exhibits those two features, which we call affinity bandits. Unsurprisingly, classical algorithms such as UCB often fail to identify the best arm in this setting. We prove a new instance-dependent regret lower bound, which is larger than that in the standard bandit setting by a multiplicative function of $K$. Since we treat rewards that are time-varying and dependent on the policy's past actions, deriving this lower bound requires developing proof techniques beyond the standard bandit techniques. Finally, we design an elimination-style algorithm which nearly matches this regret, despite never observing the real rewards.
Problem

Research questions and friction points this paper is trying to address.

Mitigating affinity bias in hiring and promotions
Optimizing candidate selection despite biased feedback
Developing algorithms to handle evolving biased rewards
Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces affinity bandits for bias mitigation
Develops new regret lower bound proof techniques
Designs elimination algorithm for unobserved rewards
๐Ÿ”Ž Similar Papers
No similar papers found.