🤖 AI Summary
This work addresses the vulnerability of deep neural networks to individual fairness violations, a critical limitation of existing repair methods that often lack provable guarantees and generalization capabilities. To bridge this gap, the authors propose ProF, a novel framework that, for the first time, integrates provable fairness into the model repair process. ProF characterizes model behavior in the neighborhood of biased inputs via interval bound propagation and employs mixed-integer linear programming with constrained optimization to uniformly support multiple sensitive attributes and diverse fairness definitions. Experimental evaluation on four benchmark datasets demonstrates that ProF achieves up to 95.93% generalization rate and 93.16% input space coverage, while improving fairness by approximately 90%, thereby offering both theoretical rigor and practical applicability.
📝 Abstract
Deep neural networks (DNNs) are suffering from ethical issues such as individual discrimination. In response, extensive NN repair techniques have been developed to adjust models and mitigate such undesired behaviors. However, existing fairness repair methods are typically data-centric, which often lack provable guarantees and generalization to unseen samples. To overcome these limitations, we propose ProF, a novel fairness repair framework with provable guarantees. The key intuition of ProF is to leverage interval bound propagation (a widely used NN verification technique) to soundly capture model outputs over the whole set $S(\mathbf{x})$ around a biased sample $\mathbf{x}$. The derived bounds are utilized to guide fairness repair which encourages the model to produce consistent outputs on $S(\mathbf{x})$. Specifically, we integrate fairness constraints and model modifications into a unified constraint-solving formulation, which can be transformed to a Mixed-Integer Linear Programming (MILP) problem solvable by off-the-shelf solvers. The solution to the MILP problem effectively induces a repaired model with guaranteed fairness over the whole set $S(\mathbf{x})$. We evaluate ProF on four widely used benchmark datasets and demonstrate that it achieves provable fairness repair, with generalization of up to 95.93\% on full datasets and 93.16\% on the entire input space. Notably, ProF can be easily configured to support multiple sensitive attributes and more practical fairness definitions, while providing provable repair guarantees and delivering around 90\% fairness improvement. Our code is available at https://github.com/nninjn/ProF.