Learning to Choose or Choosing to Learn: Best-of-N vs. Supervised Fine-Tuning for Bit String Generation

📅 2025-05-22

📈 Citations: 0

✨ Influential: 0

career value

207K/year

🤖 AI Summary

This paper theoretically compares two paradigms for adapting large language models (LLMs) to new tasks—supervised fine-tuning (SFT) and Best-of-N selection—in the context of bit-string generation. It establishes the first rigorous convergence-rate analysis under both realizable and unrealizable settings, characterizing fundamental trade-offs between response length and sample size. Under realizability, SFT achieves faster convergence with weaker dependence on sequence length. Under unrealizability, Best-of-N attains superior convergence rates and greater robustness to increasing response length in specific failure regimes. The analysis integrates probabilistic modeling, reward modeling, and sequential decision-making theory, yielding the first theoretically grounded framework for comparing convergence properties of LLM adaptation methods.

Technology Category

Application Category

📝 Abstract

Using the bit string generation problem as a case study, we theoretically compare two standard methods for adapting large language models to new tasks. The first, referred to as supervised fine-tuning, involves training a new next token predictor on good generations. The second method, Best-of-N, trains a reward model to select good responses from a collection generated by an unaltered base model. If the learning setting is realizable, we find that supervised fine-tuning outperforms BoN through a better dependence on the response length in its rate of convergence. If realizability fails, then depending on the failure mode, BoN can enjoy a better rate of convergence in either n or a rate of convergence with better dependence on the response length.

Problem

Research questions and friction points this paper is trying to address.

Compare supervised fine-tuning vs Best-of-N for bit string generation

Analyze convergence rates under realizable vs unrealizable learning settings

Determine optimal adaptation method for large language models

Innovation

Methods, ideas, or system contributions that make the work stand out.

Compares supervised fine-tuning and Best-of-N

Analyzes convergence rates under realizability

Evaluates performance based on response length

🔎 Similar Papers

No similar papers found.