π€ AI Summary
This study addresses the lack of systematic evaluation metrics for assessing cultural specificity at the sentence level in large language models (LLMs). To this end, the authors propose the Conceptual Cultural Index (CCI), an interpretable measure of cultural specificity grounded in relative universality. CCI quantifies the difference in a sentenceβs perceived universality between a target culture and other cultures, as estimated by LLMs, yielding a calibrated score that allows users to flexibly define the set of comparison cultures. Experimental results on a 400-sentence benchmark demonstrate that CCI significantly outperforms direct LLM scoring, achieving over a 10-percentage-point improvement in AUC when applied to culture-specialized models, while offering superior discriminative power and operational flexibility.
π Abstract
Large language models (LLMs) are increasingly deployed in multicultural settings; however, systematic evaluation of cultural specificity at the sentence level remains underexplored. We propose the Conceptual Cultural Index (CCI), which estimates cultural specificity at the sentence level. CCI is defined as the difference between the generality estimate within the target culture and the average generality estimate across other cultures. This formulation enables users to operationally control the scope of culture via comparison settings and provides interpretability, since the score derives from the underlying generality estimates. We validate CCI on 400 sentences (200 culture-specific and 200 general), and the resulting score distribution exhibits the anticipated pattern: higher for culture-specific sentences and lower for general ones. For binary separability, CCI outperforms direct LLM scoring, yielding more than a 10-point improvement in AUC for models specialized to the target culture. Our code is available at https://github.com/IyatomiLab/CCI .