Evaluating Cultural Knowledge Processing in Large Language Models: A Cognitive Benchmarking Framework Integrating Retrieval-Augmented Generation

📅 2025-11-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the insufficient evaluation of large language models’ (LLMs) higher-order cognitive processing capabilities—particularly regarding Taiwanese Hakka cultural knowledge—by proposing the first cognitive-layered evaluation framework integrating Bloom’s Taxonomy with retrieval-augmented generation (RAG). Methodologically, it constructs a fine-grained Hakka cultural knowledge base and employs semantic similarity computation coupled with cognitive-level mapping to enable hybrid automated-human assessment across six Bloomian levels: remembering, understanding, applying, analyzing, evaluating, and creating. Its key contributions are twofold: (1) the first integration of cognitive taxonomy with RAG for cultural intelligence evaluation, and (2) a dual-dimensional assessment mechanism jointly measuring cultural relevance and semantic accuracy. Experiments on the Taiwanese Hakka Digital Archives demonstrate that the framework significantly enhances interpretable identification of LLMs’ depth of cultural understanding and reveals systematic deficiencies in high-order tasks—especially analysis and creation.

Technology Category

Application Category

📝 Abstract
This study proposes a cognitive benchmarking framework to evaluate how large language models (LLMs) process and apply culturally specific knowledge. The framework integrates Bloom's Taxonomy with Retrieval-Augmented Generation (RAG) to assess model performance across six hierarchical cognitive domains: Remembering, Understanding, Applying, Analyzing, Evaluating, and Creating. Using a curated Taiwanese Hakka digital cultural archive as the primary testbed, the evaluation measures LLM-generated responses' semantic accuracy and cultural relevance.
Problem

Research questions and friction points this paper is trying to address.

Evaluating cultural knowledge processing in large language models
Assessing model performance across six cognitive domains
Measuring semantic accuracy and cultural relevance of responses
Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates Bloom's Taxonomy with Retrieval-Augmented Generation
Uses Taiwanese Hakka cultural archive as testbed
Measures semantic accuracy and cultural relevance
🔎 Similar Papers
No similar papers found.
Hung-Shin Lee
Hung-Shin Lee
North Co., Ltd., Taiwan
Speech Processing
C
Chen-Chi Chang
Department of Cultural Creativity and Digital Marketing, National United University
C
Ching-Yuan Chen
Department of Cultural Creativity and Digital Marketing, National United University
Y
Yun-Hsiang Hsu
Department of Cultural Creativity and Digital Marketing, National United University