π€ AI Summary
This work addresses discrimination mitigation in contextual bandits by proposing the first meta-algorithmic framework that guarantees strict statistical parityβper roundβwith provable fairness. The method transforms any efficient Hedge-based online learner (e.g., Exp4) into a dynamically fair online learning algorithm, accommodating non-stationary and unknown fairness benchmark distributions while enabling adaptive estimation thereof. Furthermore, via an online-to-batch conversion, it yields a novel batch classifier with exact statistical parity guarantees. Theoretical analysis shows its asymptotic regret bound matches that of running Exp4 independently across protected attribute groups. Key contributions include: (i) the first per-round exact statistical parity guarantee in contextual bandits; (ii) support for dynamic, non-stationary fairness constraints; and (iii) a unifying framework bridging online fair learning and batch fair classification.
π Abstract
Motivated by the need to remove discrimination in certain applications, we develop a meta-algorithm that can convert any efficient implementation of an instance of Hedge (or equivalently, an algorithm for discrete bayesian inference) into an efficient algorithm for the equivalent contextual bandit problem which guarantees exact statistical parity on every trial. Relative to any comparator with statistical parity, the resulting algorithm has the same asymptotic regret bound as running the corresponding instance of Exp4 for each protected characteristic independently. Given that our Hedge instance admits non-stationarity we can handle a varying distribution with which to enforce statistical parity with respect to, which is useful when the true population is unknown and needs to be estimated from the data received so far. Via online-to-batch conversion we can handle the equivalent batch classification problem with exact statistical parity, giving us results that we believe are novel and important in their own right.