Evaluating Cultural Knowledge Processing in Large Language Models: A Cognitive Benchmarking Framework Integrating Retrieval-Augmented Generation

📅 2025-11-03

📈 Citations: 0

✨ Influential: 0

career value

174K/year

🤖 AI Summary

This study addresses the insufficient evaluation of large language models’ (LLMs) higher-order cognitive processing capabilities—particularly regarding Taiwanese Hakka cultural knowledge—by proposing the first cognitive-layered evaluation framework integrating Bloom’s Taxonomy with retrieval-augmented generation (RAG). Methodologically, it constructs a fine-grained Hakka cultural knowledge base and employs semantic similarity computation coupled with cognitive-level mapping to enable hybrid automated-human assessment across six Bloomian levels: remembering, understanding, applying, analyzing, evaluating, and creating. Its key contributions are twofold: (1) the first integration of cognitive taxonomy with RAG for cultural intelligence evaluation, and (2) a dual-dimensional assessment mechanism jointly measuring cultural relevance and semantic accuracy. Experiments on the Taiwanese Hakka Digital Archives demonstrate that the framework significantly enhances interpretable identification of LLMs’ depth of cultural understanding and reveals systematic deficiencies in high-order tasks—especially analysis and creation.

Technology Category

Application Category

📝 Abstract

This study proposes a cognitive benchmarking framework to evaluate how large language models (LLMs) process and apply culturally specific knowledge. The framework integrates Bloom's Taxonomy with Retrieval-Augmented Generation (RAG) to assess model performance across six hierarchical cognitive domains: Remembering, Understanding, Applying, Analyzing, Evaluating, and Creating. Using a curated Taiwanese Hakka digital cultural archive as the primary testbed, the evaluation measures LLM-generated responses' semantic accuracy and cultural relevance.

Problem

Research questions and friction points this paper is trying to address.

Evaluating cultural knowledge processing in large language models

Assessing model performance across six cognitive domains

Measuring semantic accuracy and cultural relevance of responses

Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates Bloom's Taxonomy with Retrieval-Augmented Generation

Uses Taiwanese Hakka cultural archive as testbed

Measures semantic accuracy and cultural relevance

🔎 Similar Papers

Self-Alignment: Improving Alignment of Cultural Values in LLMs via In-Context Learning