🤖 AI Summary
Underrepresentation in public health data risks systemic bias, undermining the fairness and validity of downstream inference and policy decisions. To address this, we propose an operational definition of “public health data fairness,” integrating computational principles—fairness, accountability, transparency, ethics, and privacy—with core public health methodologies—including selection bias correction, representativeness assessment, and causal inference. This yields a structured, lifecycle-spanning self-audit framework grounded in reflexive practice and designed for seamless integration into routine data science workflows. Validated across multiple real-world public health applications, the framework demonstrably enhances the equitable applicability of AI and data-driven policies across diverse populations. Crucially, our analysis clarifies that data fairness constitutes a necessary—but not sufficient—condition for fair decision-making.
📝 Abstract
Data-driven decisions shape public health policies and practice, yet persistent disparities in data representation skew insights and undermine interventions. To address this, we advance a structured roadmap that integrates public health data science with computer science and is grounded in reflexivity. We adopt data equity as a guiding concept: ensuring the fair and inclusive representation, collection, and use of data to prevent the introduction or exacerbation of systemic biases that could lead to invalid downstream inference and decisions. To underscore urgency, we present three public health cases where non-representative datasets and skewed knowledge impede decisions across diverse subgroups. These challenges echo themes in two literatures: public health highlights gaps in high-quality data for specific populations, while computer science and statistics contribute criteria and metrics for diagnosing bias in data and models. Building on these foundations, we propose a working definition of public health data equity and a structured self-audit framework. Our framework integrates core computational principles (fairness, accountability, transparency, ethics, privacy, confidentiality) with key public health considerations (selection bias, representativeness, generalizability, causality, information bias) to guide equitable practice across the data life cycle, from study design and data collection to measurement, analysis, interpretation, and translation. Embedding data equity in routine practice offers a practical path for ensuring that data-driven policies, artificial intelligence, and emerging technologies improve health outcomes for all. Finally, we emphasize the critical understanding that, although data equity is an essential first step, it does not inherently guarantee information, learning, or decision equity.