🤖 AI Summary
To address memory bottlenecks in large language model (LLM) fine-tuning caused by massive datasets, this paper proposes Quant-LESS: an efficient data evaluation and selection framework tailored for memory-constrained settings. Methodologically, it introduces gradient quantization into the LESS framework for the first time, integrating LoRA-based random projection, 1-bit gradient quantization, and low-rank similarity search. This design significantly reduces memory overhead while preserving high fidelity in data value estimation. Experiments on LLaMA, Mistral, and Qwen demonstrate that Quant-LESS matches the performance of original LESS on benchmarks including MMLU, BBH, and TyDiQA, achieves up to 16× lower memory consumption, and maintains lossless data selection quality under 1-bit quantization. The core contribution is a high-fidelity, ultra-low-memory paradigm for subset selection—enabling scalable, resource-efficient LLM fine-tuning without compromising evaluation accuracy.
📝 Abstract
Fine-tuning large language models (LLMs) is often constrained by the computational costs of processing massive datasets. We propose extbf{QLESS} (Quantized Low-rank Gradient Similarity Search), which integrates gradient quantization with the LESS framework to enable memory-efficient data valuation and selection. QLESS employs a two-step compression process: first, it obtains low-dimensional gradient representations through LoRA-based random projection; then, it quantizes these gradients to low-bitwidth representations. Experiments on multiple LLM architectures (LLaMA, Mistral, Qwen) and benchmarks (MMLU, BBH, TyDiQA) show that QLESS achieves comparable data selection performance to LESS while reducing memory usage by up to 16x. Even 1-bit gradient quantization preserves data valuation quality. These findings underscore QLESS as a practical, scalable approach to identifying informative examples within strict memory constraints.