GenSelect: A Generative Approach to Best-of-N

📅 2025-07-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing generative reward models for reasoning tasks rely either on pointwise scoring or pairwise comparison; the former underutilizes large language models’ (LLMs) strong comparative reasoning capability, while the latter suffers from poor scalability under high sampling budgets. Method: We propose GenSelect—the first framework to embed a generative selection mechanism into the Best-of-N paradigm, guiding LLMs (e.g., QwQ, DeepSeek-R1-0528) via chain-of-thought prompting to directly select the optimal solution from N parallel candidate outputs. Contribution/Results: GenSelect synergistically leverages LLMs’ semantic comparison strength and enables efficient test-time scaling. Experiments on mathematical reasoning benchmarks demonstrate that GenSelect significantly outperforms conventional pointwise and pairwise reward modeling, achieving superior performance with only lightweight prompts—validating its effectiveness, simplicity, and scalability.

Technology Category

Application Category

📝 Abstract
Generative reward models with parallel sampling have enabled effective test-time scaling for reasoning tasks. Current approaches employ pointwise scoring of individual solutions or pairwise comparisons. However, pointwise methods underutilize LLMs' comparative abilities, while pairwise methods scale inefficiently with larger sampling budgets. We introduce GenSelect, where the LLM uses long reasoning to select the best solution among N candidates. This leverages LLMs' comparative strengths while scaling efficiently across parallel sampling budgets. For math reasoning, we demonstrate that reasoning models, such as QwQ and DeepSeek-R1-0528, excel at GenSelect, outperforming existing scoring approaches with simple prompting.
Problem

Research questions and friction points this paper is trying to address.

Improves selection of best solution among N candidates
Enhances LLMs' comparative abilities efficiently
Outperforms existing scoring methods in math reasoning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Generative reward models with parallel sampling
LLM uses long reasoning for best solution
Outperforms scoring approaches with simple prompting
🔎 Similar Papers
No similar papers found.