Disaggregated Health Data in LLMs: Evaluating Data Equity in the Context of Asian American Representation

📅 2025-08-01

📈 Citations: 0

✨ Influential: 0

career value

201K/year

🤖 AI Summary

Current evaluations of large language models (LLMs) in health communication often overlook intra-ethnic heterogeneity—particularly among Asian American subgroups (e.g., Korean, Chinese)—leading to “aggregation bias” that obscures subgroup-specific health disparities and undermines accuracy and representativeness. Method: This study pioneers the systematic integration of health data disaggregation into LLM fairness assessment, developing a rigorous evaluation framework that quantifies both the degree of ethnic-level health information disaggregation and fairness performance in LLM outputs, using hybrid statistical and machine learning techniques. Contribution/Results: Empirical analysis reveals that mainstream LLMs consistently suppress health differences across Asian American subgroups, exhibiting significant aggregation bias. This compromises clinical relevance and equity in health information delivery. The work introduces a novel, reproducible paradigm for assessing and improving AI responsibility and inclusivity in minority health contexts, accompanied by an open, quantifiable evaluation toolkit.

Technology Category

Application Category

📝 Abstract

Large language models (LLMs), such as ChatGPT and Claude, have emerged as essential tools for information retrieval, often serving as alternatives to traditional search engines. However, ensuring that these models provide accurate and equitable information tailored to diverse demographic groups remains an important challenge. This study investigates the capability of LLMs to retrieve disaggregated health-related information for sub-ethnic groups within the Asian American population, such as Korean and Chinese communities. Data disaggregation has been a critical practice in health research to address inequities, making it an ideal domain for evaluating representation equity in LLM outputs. We apply a suite of statistical and machine learning tools to assess whether LLMs deliver appropriately disaggregated and equitable information. By focusing on Asian American sub-ethnic groups, a highly diverse population often aggregated in traditional analyses; we highlight how LLMs handle complex disparities in health data. Our findings contribute to ongoing discussions about responsible AI, particularly in ensuring data equity in the outputs of LLM-based systems.

Problem

Research questions and friction points this paper is trying to address.

Evaluate LLMs' accuracy in providing health data for Asian sub-ethnic groups

Assess equity of disaggregated health information in LLM outputs

Examine LLM handling of disparities in diverse Asian American populations

Innovation

Methods, ideas, or system contributions that make the work stand out.

Assessing LLMs for disaggregated health data equity

Focusing on Asian American sub-ethnic group disparities

Applying statistical and machine learning evaluation tools

🔎 Similar Papers

No similar papers found.