π€ AI Summary
Existing preference alignment methods rely on intricate hyperparameter tuning and reference models, incurring high computational overhead and implementation complexity. This paper proposes SimPERβa hyperparameter-free and reference-model-free minimalist preference alignment framework. SimPER uniformly models both choice and rejection preferences by optimizing the inverse perplexity of model responses, eliminating the need for KL regularization, reward modeling, or auxiliary loss terms. It directly maximizes the average log-likelihood over the preference dataset in an end-to-end manner, drastically reducing tuning effort and implementation burden. Empirical evaluation demonstrates that SimPER achieves state-of-the-art performance across all ten benchmarks on MT-Bench, AlpacaEval 2, and the Open LLM Leaderboard. Notably, it attains a +5.7-point absolute improvement on AlpacaEval 2 and consistently ranks first overall on the Open LLM Leaderboard.
π Abstract
Existing preference optimization objectives for language model alignment require additional hyperparameters that must be extensively tuned to achieve optimal performance, increasing both the complexity and time required for fine-tuning large language models. In this paper, we propose a simple yet effective hyperparameter-free preference optimization algorithm for alignment.We observe that promising performance can be achieved simply by optimizing inverse perplexity, which is calculated as the inverse of the exponentiated average log-likelihood of the chosen and rejected responses in the preference dataset. The resulting simple learning objective, SimPER, is easy to implement and eliminates the need for expensive hyperparameter tuning and a reference model, making it both computationally and memory efficient. Extensive experiments on widely used real-world benchmarks, including MT-Bench, AlpacaEval 2, and 10 key benchmarks of the Open LLM Leaderboard with 5 base models, demonstrate that SimPER consistently and significantly outperforms existing approaches-even without any hyperparameters or a reference model . For example, despite its simplicity, SimPER outperforms state-of-the-art methods by up to 5.7 points on AlpacaEval 2 and achieves the highest average ranking across 10 benchmarks on the Open LLM Leaderboard. The source code for SimPER is publicly available at: https://github.com/tengxiao1/SimPER.