GenSelect: A Generative Approach to Best-of-N

📅 2025-07-23

📈 Citations: 0

✨ Influential: 0

career value

187K/year

🤖 AI Summary

Existing generative reward models for reasoning tasks rely either on pointwise scoring or pairwise comparison; the former underutilizes large language models’ (LLMs) strong comparative reasoning capability, while the latter suffers from poor scalability under high sampling budgets. Method: We propose GenSelect—the first framework to embed a generative selection mechanism into the Best-of-N paradigm, guiding LLMs (e.g., QwQ, DeepSeek-R1-0528) via chain-of-thought prompting to directly select the optimal solution from N parallel candidate outputs. Contribution/Results: GenSelect synergistically leverages LLMs’ semantic comparison strength and enables efficient test-time scaling. Experiments on mathematical reasoning benchmarks demonstrate that GenSelect significantly outperforms conventional pointwise and pairwise reward modeling, achieving superior performance with only lightweight prompts—validating its effectiveness, simplicity, and scalability.

Technology Category

Application Category

📝 Abstract

Generative reward models with parallel sampling have enabled effective test-time scaling for reasoning tasks. Current approaches employ pointwise scoring of individual solutions or pairwise comparisons. However, pointwise methods underutilize LLMs' comparative abilities, while pairwise methods scale inefficiently with larger sampling budgets. We introduce GenSelect, where the LLM uses long reasoning to select the best solution among N candidates. This leverages LLMs' comparative strengths while scaling efficiently across parallel sampling budgets. For math reasoning, we demonstrate that reasoning models, such as QwQ and DeepSeek-R1-0528, excel at GenSelect, outperforming existing scoring approaches with simple prompting.

Problem

Research questions and friction points this paper is trying to address.

Improves selection of best solution among N candidates

Enhances LLMs' comparative abilities efficiently

Outperforms existing scoring methods in math reasoning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Generative reward models with parallel sampling

LLM uses long reasoning for best solution

Outperforms scoring approaches with simple prompting

🔎 Similar Papers

No similar papers found.