CultureScope: A Dimensional Lens for Probing Cultural Understanding in LLMs

📅 2025-09-19

📈 Citations: 0

✨ Influential: 0

career value

182K/year

🤖 AI Summary

Existing LLM cultural understanding evaluation benchmarks suffer from weak theoretical foundations, poor cross-cultural scalability, and heavy reliance on manual annotation. Method: Grounded in Hofstede’s Iceberg Theory of Culture, we propose the first systematic, scalable, three-layered, 140-dimensional cultural understanding evaluation framework. It features a fine-grained, theory-driven dimensional schema, automated multilingual knowledge base construction, and synthetic benchmark data generation via cultural dimension modeling, large-scale knowledge extraction, and LLM-based evaluation techniques. Contribution/Results: Experiments reveal that mainstream LLMs—despite multilingual training—show no significant improvement in deep cultural understanding. Our benchmark covers diverse cultural contexts and empirically exposes widespread deficits in implicit cultural cognition across models. This work establishes the first theoretically grounded, large-scale, and reproducible evaluation benchmark for cultural alignment research.

Technology Category

Application Category

📝 Abstract

As large language models (LLMs) are increasingly deployed in diverse cultural environments, evaluating their cultural understanding capability has become essential for ensuring trustworthy and culturally aligned applications. However, most existing benchmarks lack comprehensiveness and are challenging to scale and adapt across different cultural contexts, because their frameworks often lack guidance from well-established cultural theories and tend to rely on expert-driven manual annotations. To address these issues, we propose CultureScope, the most comprehensive evaluation framework to date for assessing cultural understanding in LLMs. Inspired by the cultural iceberg theory, we design a novel dimensional schema for cultural knowledge classification, comprising 3 layers and 140 dimensions, which guides the automated construction of culture-specific knowledge bases and corresponding evaluation datasets for any given languages and cultures. Experimental results demonstrate that our method can effectively evaluate cultural understanding. They also reveal that existing large language models lack comprehensive cultural competence, and merely incorporating multilingual data does not necessarily enhance cultural understanding. All code and data files are available at https://github.com/HoganZinger/Culture

Problem

Research questions and friction points this paper is trying to address.

Evaluating cultural understanding in LLMs across diverse environments

Addressing lack of comprehensive and scalable cultural evaluation benchmarks

Automating culture-specific knowledge base construction for LLM assessment

Innovation

Methods, ideas, or system contributions that make the work stand out.

Dimensional schema for cultural knowledge classification

Automated construction of culture-specific evaluation datasets

Cultural iceberg theory-inspired framework design

🔎 Similar Papers

Self-Alignment: Improving Alignment of Cultural Values in LLMs via In-Context Learning