SCOP: Evaluating the Comprehension Process of Large Language Models from a Cognitive View

📅 2025-06-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing LLM understanding evaluations predominantly focus on output correctness while neglecting alignment between model reasoning processes and human expert cognition, undermining assessment reliability. Method: We propose SCOP, the first cognitive-process-oriented evaluation framework for language understanding, grounded in cognitive psychology. SCOP defines five core understanding competencies, constructs a rigorous, multi-dimensional bias-aware test suite, and integrates human annotation with controllable generation techniques. Contribution/Results: Through fine-grained behavioral analysis and attribution-based diagnosis, we conduct the first systematic empirical study across major open- and closed-source LLMs. Our findings reveal a critical unreliability phenomenon—“correct answers with flawed reasoning”—where models frequently outperform humans locally yet fail to achieve expert-level process consistency. SCOP establishes both a theoretical foundation and an empirical benchmark for understanding-driven model training and evaluation.

Technology Category

Application Category

📝 Abstract
Despite the great potential of large language models(LLMs) in machine comprehension, it is still disturbing to fully count on them in real-world scenarios. This is probably because there is no rational explanation for whether the comprehension process of LLMs is aligned with that of experts. In this paper, we propose SCOP to carefully examine how LLMs perform during the comprehension process from a cognitive view. Specifically, it is equipped with a systematical definition of five requisite skills during the comprehension process, a strict framework to construct testing data for these skills, and a detailed analysis of advanced open-sourced and closed-sourced LLMs using the testing data. With SCOP, we find that it is still challenging for LLMs to perform an expert-level comprehension process. Even so, we notice that LLMs share some similarities with experts, e.g., performing better at comprehending local information than global information. Further analysis reveals that LLMs can be somewhat unreliable -- they might reach correct answers through flawed comprehension processes. Based on SCOP, we suggest that one direction for improving LLMs is to focus more on the comprehension process, ensuring all comprehension skills are thoroughly developed during training.
Problem

Research questions and friction points this paper is trying to address.

Evaluating LLM comprehension alignment with expert cognitive processes
Assessing LLM performance on five requisite comprehension skills
Identifying unreliable LLM reasoning despite correct answers
Innovation

Methods, ideas, or system contributions that make the work stand out.

Systematic definition of five comprehension skills
Strict framework for testing data construction
Detailed analysis of open and closed-source LLMs
🔎 Similar Papers
No similar papers found.
Y
Yongjie Xiao
Sichuan University, China; Engineering Research Center of Machine Learning and Industry Intelligence, Ministry of Education, China
Hongru Liang
Hongru Liang
Sichuan University
P
Peixin Qin
Sichuan University, China; Engineering Research Center of Machine Learning and Industry Intelligence, Ministry of Education, China
Y
Yao Zhang
School of Statistics and Data Science, AAIS, Nankai University, Tianjin, China
W
Wenqiang Lei
Sichuan University, China; Engineering Research Center of Machine Learning and Industry Intelligence, Ministry of Education, China