🤖 AI Summary
This study addresses the insufficient evaluation of large language models’ (LLMs) higher-order cognitive processing capabilities—particularly regarding Taiwanese Hakka cultural knowledge—by proposing the first cognitive-layered evaluation framework integrating Bloom’s Taxonomy with retrieval-augmented generation (RAG). Methodologically, it constructs a fine-grained Hakka cultural knowledge base and employs semantic similarity computation coupled with cognitive-level mapping to enable hybrid automated-human assessment across six Bloomian levels: remembering, understanding, applying, analyzing, evaluating, and creating. Its key contributions are twofold: (1) the first integration of cognitive taxonomy with RAG for cultural intelligence evaluation, and (2) a dual-dimensional assessment mechanism jointly measuring cultural relevance and semantic accuracy. Experiments on the Taiwanese Hakka Digital Archives demonstrate that the framework significantly enhances interpretable identification of LLMs’ depth of cultural understanding and reveals systematic deficiencies in high-order tasks—especially analysis and creation.
📝 Abstract
This study proposes a cognitive benchmarking framework to evaluate how large language models (LLMs) process and apply culturally specific knowledge. The framework integrates Bloom's Taxonomy with Retrieval-Augmented Generation (RAG) to assess model performance across six hierarchical cognitive domains: Remembering, Understanding, Applying, Analyzing, Evaluating, and Creating. Using a curated Taiwanese Hakka digital cultural archive as the primary testbed, the evaluation measures LLM-generated responses' semantic accuracy and cultural relevance.