Optimizing Temperature for Language Models with Multi-Sample Inference

📅 2025-02-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses the challenge of adaptively tuning the temperature parameter in multi-sample inference methods (e.g., best-of-N, majority voting), where optimal temperature selection is typically task- and data-dependent. We propose a fully automated, task-agnostic temperature optimization method that requires no labeled validation data. Our approach features: (1) an unsupervised evaluation metric based on output distribution entropy—replacing conventional supervised validation—and (2) a stochastic process model that explicitly characterizes the dynamic trade-off between sampling diversity and consistency as a function of temperature, enhancing interpretability. Extensive experiments across diverse large language models (Llama, Qwen, GPT series) and reasoning-intensive tasks—including mathematical reasoning, code generation, and commonsense question answering—demonstrate consistent and significant performance gains over fixed-temperature baselines. The method achieves robust improvements without any reliance on annotated data or task-specific calibration.

Technology Category

Application Category

📝 Abstract
Multi-sample aggregation strategies, such as majority voting and best-of-N sampling, are widely used in contemporary large language models (LLMs) to enhance predictive accuracy across various tasks. A key challenge in this process is temperature selection, which significantly impacts model performance. Existing approaches either rely on a fixed default temperature or require labeled validation data for tuning, which are often scarce and difficult to obtain. This paper addresses the challenge of automatically identifying the (near)-optimal temperature for different LLMs using multi-sample aggregation strategies, without relying on task-specific validation data. We provide a comprehensive analysis of temperature's role in performance optimization, considering variations in model architectures, datasets, task types, model sizes, and predictive accuracy. Furthermore, we propose a novel entropy-based metric for automated temperature optimization, which consistently outperforms fixed-temperature baselines. Additionally, we incorporate a stochastic process model to enhance interpretability, offering deeper insights into the relationship between temperature and model performance.
Problem

Research questions and friction points this paper is trying to address.

Optimizing temperature for language models
Automated temperature without validation data
Entropy-based metric for performance optimization
Innovation

Methods, ideas, or system contributions that make the work stand out.

Automated temperature optimization
Entropy-based metric
Stochastic process model
🔎 Similar Papers
No similar papers found.
Weihua Du
Weihua Du
LTI, Carnegie Mellon University
language modelsreinforcement learningembodied AI
Y
Yiming Yang
Language Technologies Institute, Carnegie Mellon University
S
Sean Welleck
Language Technologies Institute, Carnegie Mellon University