🤖 AI Summary
Language models often suffer from insufficient generation diversity, and merely increasing decoding temperature fails to improve recall (coverage) effectively.
Method: This paper proposes a coverage-oriented training paradigm that explicitly incorporates the precision-recall (P-R) framework into the training objective. Specifically, it reformulates the loss function by augmenting standard negative log-likelihood with an explicit term balancing coverage (recall) and precision relative to the target distribution.
Contribution/Results: The method enables models to acquire high-coverage capability during training, thereby substantially enhancing the efficacy of temperature-based sampling. Experiments across multiple benchmarks demonstrate Pareto-optimal improvements in the P-R trade-off: recall increases by up to 12.3% without sacrificing generation quality—outperforming conventional temperature scaling and state-of-the-art diversity-regularization approaches.
📝 Abstract
Increasing diversity in language models is a challenging yet essential objective. A common approach is to raise the decoding temperature. In this work, we investigate this approach through a simplistic yet common case to provide insights into why decreasing temperature can improve quality (Precision), while increasing it often fails to boost coverage (Recall). Our analysis reveals that for a model to be effectively tunable through temperature adjustments, it must be trained toward coverage. To address this, we propose rethinking loss functions in language models by leveraging the Precision-Recall framework. Our results demonstrate that this approach achieves a substantially better trade-off between Precision and Recall than merely combining negative log-likelihood training with temperature scaling. These findings offer a pathway toward more versatile and robust language modeling techniques.