Fair Play for Individuals, Foul Play for Groups? Auditing Anonymization's Impact on ML Fairness

📅 2025-05-12

📈 Citations: 0

✨ Influential: 0

career value

178K/year

🤖 AI Summary

Anonymous data publishing techniques—namely k-anonymity, ℓ-diversity, and t-closeness—exhibit a dual effect on machine learning fairness: improving individual fairness while severely degrading group fairness. Method: We conduct the first quantitative empirical study across multiple datasets and privacy budgets, rigorously auditing how anonymization impacts fairness metrics. We identify input homogenization as a novel mechanism driving individual fairness gains and propose a unified privacy–fairness–utility triadic evaluation framework. Contribution/Results: Strong anonymization worsens group fairness—e.g., statistical parity difference—by up to four orders of magnitude, while boosting individual fairness by 37% on average. We open-source a complete toolchain—including preprocessing, anonymization, fairness auditing, and benchmark results—to enable reproducible assessment. This work establishes foundational theoretical insights and practical guidelines for co-designing privacy-preserving and fair machine learning systems.

Technology Category

Application Category

📝 Abstract

Machine learning (ML) algorithms are heavily based on the availability of training data, which, depending on the domain, often includes sensitive information about data providers. This raises critical privacy concerns. Anonymization techniques have emerged as a practical solution to address these issues by generalizing features or suppressing data to make it more difficult to accurately identify individuals. Although recent studies have shown that privacy-enhancing technologies can influence ML predictions across different subgroups, thus affecting fair decision-making, the specific effects of anonymization techniques, such as $k$-anonymity, $ell$-diversity, and $t$-closeness, on ML fairness remain largely unexplored. In this work, we systematically audit the impact of anonymization techniques on ML fairness, evaluating both individual and group fairness. Our quantitative study reveals that anonymization can degrade group fairness metrics by up to four orders of magnitude. Conversely, similarity-based individual fairness metrics tend to improve under stronger anonymization, largely as a result of increased input homogeneity. By analyzing varying levels of anonymization across diverse privacy settings and data distributions, this study provides critical insights into the trade-offs between privacy, fairness, and utility, offering actionable guidelines for responsible AI development. Our code is publicly available at: https://github.com/hharcolezi/anonymity-impact-fairness.

Problem

Research questions and friction points this paper is trying to address.

Auditing anonymization's effect on ML fairness metrics

Exploring trade-offs between privacy, fairness, and utility

Evaluating individual vs group fairness under anonymization techniques

Innovation

Methods, ideas, or system contributions that make the work stand out.

Audits anonymization impact on ML fairness

Evaluates k-anonymity, l-diversity, t-closeness effects

Quantifies privacy-fairness-utility trade-offs

🔎 Similar Papers

Long-Term Fairness Inquiries and Pursuits in Machine Learning: A Survey of Notions, Methods, and Challenges