🤖 AI Summary
This paper addresses real-time runtime verification of static fairness in machine learning systems with unknown but Markovian dynamics, focusing on dynamically quantifying and certifying decision bias with respect to sensitive attributes under partial or full observability.
Method: We propose a formal specification language expressive enough to encode multiple fairness notions and develop two statistical monitoring algorithms—offering uniform and non-uniform error bounds—to enable progressively precise, confidence-guaranteed quantitative fairness verification. Our approach integrates Markov chain modeling, sequential observation analysis, and lightweight quantitative verification.
Contribution/Results: The prototype system achieves millisecond-scale response times on loan approval and university admission benchmarks, significantly improving the timeliness, reliability, and scalability of fairness monitoring compared to existing methods.
📝 Abstract
Machine-learned systems are in widespread use for making decisions about humans, and it is important that they are fair, i.e., not biased against individuals based on sensitive attributes.
We present a general framework of runtime verification of algorithmic fairness for systems whose models are unknown, but are assumed to have a Markov chain structure, with or without full observation of the state space.
We introduce a specification language that can model many common algorithmic fairness properties, such as demographic parity, equal opportunity, and social burden.
We build monitors that observe a long sequence of events as generated by a given system, and output, after each observation, a quantitative estimate of how fair or biased the system was on that run until that point in time.
The estimate is proven to be correct modulo a variable error bound and a given confidence level, where the error bound gets tighter as the observed sequence gets longer.
We present two categories of monitoring algorithms, namely ones with a uniform error bound across all time points, and ones with weaker non-uniform, pointwise error bounds at different time points.
Our monitoring algorithms use statistical tools that are adapted to suit the dynamic requirements of monitoring and the special needs of the fairness specifications.
Using a prototype implementation, we show how we can monitor if a bank is fair in giving loans to applicants from different social backgrounds, and if a college is fair in admitting students while maintaining a reasonable financial burden on the society.
In these experiments, our monitors took less than a millisecond to update their verdicts after each observation.