RECKON: Large-scale Reference-based Efficient Knowledge Evaluation for Large Language Model

📅 2025-04-01
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the high computational cost and substantial information loss inherent in conventional benchmarking for evaluating large language models’ (LLMs) knowledge capabilities, this paper proposes a lightweight evaluation paradigm grounded in raw reference data. Our method employs unsupervised clustering to construct structured knowledge units and integrates reference-driven, targeted question generation to enable scalable, cross-domain validation. Unlike traditional benchmarks, our framework eliminates reliance on manual annotation and task-specific re-engineering, thereby significantly improving evaluation efficiency and fidelity. Experimental results across four domains—world knowledge, programming, law, and biomedicine—demonstrate an average accuracy of 97.2%, while reducing resource consumption by 56.5% compared to standard benchmarking approaches. This work establishes a novel, efficient, interpretable, and low-overhead paradigm for LLM knowledge assessment.

Technology Category

Application Category

📝 Abstract
As large language models (LLMs) advance, efficient knowledge evaluation becomes crucial to verifying their capabilities. Traditional methods, relying on benchmarks, face limitations such as high resource costs and information loss. We propose the Large-scale Reference-based Efficient Knowledge Evaluation for Large Language Model (RECKON), which directly uses reference data to evaluate models. RECKON organizes unstructured data into manageable units and generates targeted questions for each cluster, improving evaluation accuracy and efficiency. Experimental results show that RECKON reduces resource consumption by 56.5% compared to traditional methods while achieving over 97% accuracy across various domains, including world knowledge, code, legal, and biomedical datasets. Code is available at https://github.com/MikeGu721/reckon
Problem

Research questions and friction points this paper is trying to address.

Efficiently evaluate large language model knowledge
Reduce resource costs in model evaluation
Improve accuracy across diverse knowledge domains
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses reference data for direct evaluation
Organizes data into manageable clusters
Generates targeted questions per cluster
🔎 Similar Papers
No similar papers found.
L
Lin Zhang
Shanghai Key Laboratory of Data Science, School of Computer Science, Fudan University
Zhouhong Gu
Zhouhong Gu
Fudan University
Language ModelingAutomated SocietyModel Editing
X
Xiaoran Shi
Shanghai Key Laboratory of Data Science, School of Computer Science, Fudan University
Hongwei Feng
Hongwei Feng
Fudan University
knowledge management,AI,big data
Y
Yanghua Xiao
Shanghai Key Laboratory of Data Science, School of Computer Science, Fudan University