🤖 AI Summary
Existing LoRA variants report conflicting performance improvements on the same benchmarks, making it difficult to reliably assess their effectiveness. This work systematically investigates this inconsistency and identifies batch size as a first-order design factor—rather than a minor implementation detail—that significantly biases evaluation outcomes. To address this, we propose a low-cost batch size tuning strategy based on proxy metrics and jointly analyze the interplay among rank, dataset size, and model capacity. Experiments demonstrate that, with proper tuning, standard LoRA can match the performance of more complex variants, substantially improving evaluation reliability and reproducibility, and reconciling previously contradictory findings.
📝 Abstract
Low-rank adaptation (LoRA) is a standard approach for fine-tuning large language models, yet its many variants report conflicting empirical gains, often on the same benchmarks. We show that these contradictions arise from a single overlooked factor: the batch size. When properly tuned, vanilla LoRA often matches the performance of more complex variants. We further propose a proxy-based, cost-efficient strategy for batch size tuning, revealing the impact of rank, dataset size, and model capacity on the optimal batch size. Our findings elevate batch size from a minor implementation detail to a first-order design parameter, reconciling prior inconsistencies and enabling more reliable evaluations of LoRA variants.