🤖 AI Summary
This study investigates how restricted data access undermines the reliability of algorithmic fairness audits. Using simulation models grounded in real-world parole and healthcare prediction scenarios, we quantify estimation errors in group-level fairness metrics (e.g., demographic parity) under three data access regimes: aggregated statistics, individual-level data with model outputs, and individual-level data without model outputs. We provide the first systematic evidence that data minimization and anonymization substantially inflate audit error—exceeding 300% under anonymized individual data—while aggregated statistics exhibit surprising robustness for certain metrics. Building on these findings, we propose a data-access tiering framework designed to enhance auditability, bridging practical gaps among regulators, auditors, and human-AI interaction designers. Our results offer empirically grounded guidance for data-sharing policies governing fairness auditing.
📝 Abstract
Independent algorithm audits hold the promise of bringing accountability to automated decision-making. However, third-party audits are often hindered by access restrictions, forcing auditors to rely on limited, low-quality data. To study how these limitations impact research integrity, we conduct audit simulations on two realistic case studies for recidivism and healthcare coverage prediction. We examine the accuracy of estimating group parity metrics across three levels of access: (a) aggregated statistics, (b) individual-level data with model outputs, and (c) individual-level data without model outputs. Despite selecting one of the simplest tasks for algorithmic auditing, we find that data minimization and anonymization practices can strongly increase error rates on individual-level data, leading to unreliable assessments. We discuss implications for independent auditors, as well as potential avenues for HCI researchers and regulators to improve data access and enable both reliable and holistic evaluations.