SparseEval: Efficient Evaluation of Large Language Models by Sparse Optimization

📅 2026-02-08

📈 Citations: 0

✨ Influential: 0

career value

210K/year

🤖 AI Summary

This work addresses the high computational cost of evaluating large language models, which typically relies on extensive inference samples. To enable efficient evaluation, the authors formulate the problem as a sparse optimization task, leveraging the inherent sparsity of the model–sample performance matrix. They introduce a novel approach that employs gradient descent to optimize anchor weights and integrates task-aware anchor selection with a candidate importance scoring mechanism, iteratively refining the set of anchors. By combining the representational capacity of multilayer perceptrons (MLPs) with the proposed optimization strategy, the method achieves low estimation error and high Kendall’s τ across multiple benchmarks, significantly improving the efficiency, robustness, and practicality of large language model evaluation.

Technology Category

Application Category

📝 Abstract

As large language models (LLMs) continue to scale up, their performance on various downstream tasks has significantly improved. However, evaluating their capabilities has become increasingly expensive, as performing inference on a large number of benchmark samples incurs high computational costs. In this paper, we revisit the model-item performance matrix and show that it exhibits sparsity, that representative items can be selected as anchors, and that the task of efficient benchmarking can be formulated as a sparse optimization problem. Based on these insights, we propose SparseEval, a method that, for the first time, adopts gradient descent to optimize anchor weights and employs an iterative refinement strategy for anchor selection. We utilize the representation capacity of MLP to handle sparse optimization and propose the Anchor Importance Score and Candidate Importance Score to evaluate the value of each item for task-aware refinement. Extensive experiments demonstrate the low estimation error and high Kendall's~$\tau$ of our method across a variety of benchmarks, showcasing its superior robustness and practicality in real-world scenarios. Code is available at {https://github.com/taolinzhang/SparseEval}.

Problem

Research questions and friction points this paper is trying to address.

Large Language Models

Efficient Evaluation

Benchmarking

Computational Cost

Sparse Optimization

Innovation

Methods, ideas, or system contributions that make the work stand out.

sparse optimization

efficient evaluation

anchor selection