🤖 AI Summary
This study identifies a systemic bias in healthcare equity assessment arising from delayed reporting of demographic attributes (e.g., race/ethnicity). Leveraging 5 million real-world electronic health records, we first quantify the heterogeneous, spatiotemporally uneven patterns of such delays across population subgroups and develop a missingness mechanism model alongside a multi-level (national/state/clinic) temporal attribution framework. We find substantial average delays with pronounced intergroup disparities, causing directional misclassification in over 32% of state-level and 68% of clinic-level health disparity conclusions. Conventional imputation reduces misclassification by only 11%, underscoring the critical impact of timeliness bias. The paper introduces a novel fairness evaluation paradigm explicitly accounting for data pipeline latency, establishing both a methodological foundation and practical guidelines for real-world equity auditing.
📝 Abstract
Conducting disparity assessments at regular time intervals is critical for surfacing potential biases in decision-making and improving outcomes across demographic groups. Because disparity assessments fundamentally depend on the availability of demographic information, their efficacy is limited by the availability and consistency of available demographic identifiers. While prior work has considered the impact of missing data on fairness, little attention has been paid to the role of delayed demographic data. Delayed data, while eventually observed, might be missing at the critical point of monitoring and action -- and delays may be unequally distributed across groups in ways that distort disparity assessments. We characterize such impacts in healthcare, using electronic health records of over 5M patients across primary care practices in all 50 states. Our contributions are threefold. First, we document the high rate of race and ethnicity reporting delays in a healthcare setting and demonstrate widespread variation in rates at which demographics are reported across different groups. Second, through a set of retrospective analyses using real data, we find that such delays impact disparity assessments and hence conclusions made across a range of consequential healthcare outcomes, particularly at more granular levels of state-level and practice-level assessments. Third, we find limited ability of conventional methods that impute missing race in mitigating the effects of reporting delays on the accuracy of timely disparity assessments. Our insights and methods generalize to many domains of algorithmic fairness where delays in the availability of sensitive information may confound audits, thus deserving closer attention within a pipeline-aware machine learning framework.