🤖 AI Summary
To address scalability and computational efficiency bottlenecks in bias detection for high-risk AI systems, this paper introduces two novel bias metrics: Maximum Subgroup Discrepancy (MSD) and subsampled ℓ∞ distance—designed to overcome the intractability of conventional distance-based measures in high-dimensional, large-scale settings. Building upon these metrics, we develop an open-source Python toolkit featuring modular APIs, standardized evaluation pipelines, and comprehensive documentation with multi-scenario usage examples. Empirical evaluation demonstrates that our methods retain statistical sensitivity while substantially reducing computational complexity—enabling real-time, large-scale model auditing. The framework aligns with regulatory requirements such as the EU AI Act, providing a deployable, verifiable technical foundation for AI compliance assessment.
📝 Abstract
There is a strong recent emphasis on trustworthy AI. In particular, international regulations, such as the AI Act, demand that AI practitioners measure data quality on the input and estimate bias on the output of high-risk AI systems. However, there are many challenges involved, including scalability (MMD) and computability (Wasserstein-1) issues of traditional methods for estimating distances on measure spaces. Here, we present humancompatible.detect, a toolkit for bias detection that addresses these challenges. It incorporates two newly developed methods to detect and evaluate bias: maximum subgroup discrepancy (MSD) and subsampled $ell_infty$ distances. It has an easy-to-use API documented with multiple examples. humancompatible.detect is licensed under the Apache License, Version 2.0.