🤖 AI Summary
Existing measures of historical structural oppression suffer from nation-specificity, material-resource bias, and insufficient attention to identity-based social exclusion—hindering cross-national comparability. This paper introduces a rule-guided, large language model (LLM) framework that generates context-sensitive, interpretable identity-based historical disadvantage scores from multilingual self-reported ethnic narratives. Our approach integrates theory-driven prompt engineering, cross-cultural semantic modeling, and systematic evaluation to quantify historical oppression embedded in unstructured textual accounts. Empirical validation demonstrates robust cross-national pattern detection and substantially improves upon conventional metrics in empirical grounding, cultural adaptability, and interpretability. We publicly release a benchmark dataset and analytical toolkit, establishing the first reproducible, scalable paradigm for measuring systemic exclusion—advancing research in public health and social inequality.
📝 Abstract
Traditional efforts to measure historical structural oppression struggle with cross-national validity due to the unique, locally specified histories of exclusion, colonization, and social status in each country, and often have relied on structured indices that privilege material resources while overlooking lived, identity-based exclusion. We introduce a novel framework for oppression measurement that leverages Large Language Models (LLMs) to generate context-sensitive scores of lived historical disadvantage across diverse geopolitical settings. Using unstructured self-identified ethnicity utterances from a multilingual COVID-19 global study, we design rule-guided prompting strategies that encourage models to produce interpretable, theoretically grounded estimations of oppression. We systematically evaluate these strategies across multiple state-of-the-art LLMs. Our results demonstrate that LLMs, when guided by explicit rules, can capture nuanced forms of identity-based historical oppression within nations. This approach provides a complementary measurement tool that highlights dimensions of systemic exclusion, offering a scalable, cross-cultural lens for understanding how oppression manifests in data-driven research and public health contexts. To support reproducible evaluation, we release an open-sourced benchmark dataset for assessing LLMs on oppression measurement (https://github.com/chattergpt/llm-oppression-benchmark).