PEFT-Bench: A Parameter-Efficient Fine-Tuning Methods Benchmark

📅 2025-11-26

📈 Citations: 0

✨ Influential: 0

career value

189K/year

🤖 AI Summary

Large language models (LLMs) face high computational and environmental costs due to their massive parameter counts, limiting accessibility. While parameter-efficient fine-tuning (PEFT) methods reduce training overhead, existing evaluation frameworks suffer from limited model/dataset coverage and poor reproducibility. To address this, we introduce PEFT-Bench—the first unified, end-to-end benchmark for autoregressive LLMs—encompassing 27 NLP datasets and six mainstream PEFT methods. We further propose the PEFT Soft Score Penalties (PSCP), a novel multidimensional metric that jointly quantifies trainable parameter count, inference latency, and training memory consumption. Extensive experiments systematically characterize the accuracy–efficiency trade-offs across PEFT methods, significantly enhancing benchmark comprehensiveness, fairness, transparency, and reproducibility.

Technology Category

Application Category

📝 Abstract

Despite the state-of-the-art performance of Large Language Models (LLMs) achieved on many tasks, their massive scale often leads to high computational and environmental costs, limiting their accessibility. Parameter-efficient fine-tuning (PEFT) methods address this challenge by reducing the number of trainable parameters while maintaining strong downstream performance. Despite the increased development in PEFT methods, current evaluations remain limited (in terms of evaluated models and datasets) and difficult to reproduce. To bridge this gap, we introduce PEFT-Bench, a unified end-to-end benchmark for evaluating diverse PEFT methods on autoregressive LLMs. We demonstrate its usage across 27 NLP datasets and 6 PEFT methods. To account for different PEFT training and inference factors, we also introduce the PEFT Soft Score Penalties (PSCP) metric, which takes trainable parameters, inference speed, and training memory usage into account.

Problem

Research questions and friction points this paper is trying to address.

Benchmarking parameter-efficient fine-tuning methods for large language models

Addressing limited and irreproducible evaluations of PEFT methods

Developing unified metrics for training and inference efficiency comparisons

Innovation

Methods, ideas, or system contributions that make the work stand out.

Benchmark evaluates parameter-efficient fine-tuning methods

Unified framework tests six methods across 27 datasets

Introduces metric accounting for parameters speed and memory

🔎 Similar Papers

Parameter-Efficient Fine-Tuning for Pre-Trained Vision Models: A Survey