Bias Detection via Maximum Subgroup Discrepancy

📅 2025-02-04

📈 Citations: 0

✨ Influential: 0

career value

217K/year

🤖 AI Summary

This paper addresses three key challenges in trustworthy AI: (i) difficulty in bias detection, (ii) high sample complexity of conventional statistical distances (e.g., total variation, Wasserstein distance), and (iii) lack of subgroup-level interpretability. To this end, we propose the Maximum Subgroup Discrepancy (MSD) distance—a novel metric that jointly ensures global statistical guarantees and subgroup-level interpretability. MSD provides theoretical assurance for detecting bias across *any* feature-defined subgroup and supports both attribution analysis and actionable bias mitigation guidance. Its sample complexity is linear in the number of features, substantially lower than existing methods. We develop an efficient algorithm based on mixed-integer optimization (MIO), integrating structured pruning and convex relaxation techniques. Empirical evaluation on multiple real-world datasets shows that MSD improves detection sensitivity by 3.2× over total variation and Wasserstein distances, reduces required sample size by 87%, and yields directly interpretable outputs.

Technology Category

Application Category

📝 Abstract

Bias evaluation is fundamental to trustworthy AI, both in terms of checking data quality and in terms of checking the outputs of AI systems. In testing data quality, for example, one may study a distance of a given dataset, viewed as a distribution, to a given ground-truth reference dataset. However, classical metrics, such as the Total Variation and the Wasserstein distances, are known to have high sample complexities and, therefore, may fail to provide meaningful distinction in many practical scenarios. In this paper, we propose a new notion of distance, the Maximum Subgroup Discrepancy (MSD). In this metric, two distributions are close if, roughly, discrepancies are low for all feature subgroups. While the number of subgroups may be exponential, we show that the sample complexity is linear in the number of features, thus making it feasible for practical applications. Moreover, we provide a practical algorithm for the evaluation of the distance, based on Mixed-integer optimization (MIO). We also note that the proposed distance is easily interpretable, thus providing clearer paths to fixing the biases once they have been identified. It also provides guarantees for all subgroups. Finally, we empirically evaluate, compare with other metrics, and demonstrate the above properties of MSD on real-world datasets.

Problem

Research questions and friction points this paper is trying to address.

Detects bias in AI systems

Introduces Maximum Subgroup Discrepancy metric

Ensures interpretability and subgroup guarantees

Innovation

Methods, ideas, or system contributions that make the work stand out.

Maximum Subgroup Discrepancy metric

Linear sample complexity

Mixed-integer optimization algorithm

🔎 Similar Papers

MABR: Multilayer Adversarial Bias Removal Without Prior Bias Knowledge