ModeX: Evaluator-Free Best-of-N Selection for Open-Ended Generation

📅 2026-01-05

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

178K/year

🤖 AI Summary

This work addresses the challenge of unsupervised selection of high-quality outputs from multiple stochastic generations in open-ended tasks. The authors propose ModeX, a framework that generalizes majority voting to the semantic level by constructing a similarity graph over candidate texts and recursively applying spectral clustering to identify modal outputs that represent semantic consensus—without requiring external evaluators or exact matches. A lightweight variant, ModeX-Lite, further enhances efficiency through an effective pruning strategy. Evaluated across diverse tasks including text summarization, code generation, and mathematical reasoning, ModeX significantly outperforms both single-path and multi-path baselines, achieving superior performance while maintaining computational efficiency. Notably, this approach enables semantic-level mode extraction without additional models or inference overhead.

Technology Category

Application Category

📝 Abstract

Selecting a single high-quality output from multiple stochastic generations remains a fundamental challenge for large language models (LLMs), particularly in open-ended tasks where no canonical answer exists. While Best-of-N and self-consistency methods show that aggregating multiple generations can improve performance, existing approaches typically rely on external evaluators, reward models, or exact string-match voting, limiting their applicability and efficiency. We propose Mode Extraction (ModeX), an evaluator-free Best-of-N selection framework that generalizes majority voting to open-ended text generation by identifying the modal output representing the dominant semantic consensus among generated texts. ModeX constructs a similarity graph over candidate generations and recursively applies spectral clustering to select a representative centroid, without requiring additional inference or auxiliary models. We further instantiate this selection principle as ModeX-Lite, an improved version of ModeX with early pruning for efficiency. Across open-ended tasks -- including text summarization, code generation, and mathematical reasoning -- our approaches consistently outperform standard single- and multi-path baselines, providing a computationally efficient solution for robust open-ended text generation. Code is released in https://github.com/deeplearning-wisc/ModeX.

Problem

Research questions and friction points this paper is trying to address.

open-ended generation

Best-of-N selection

output selection

large language models

semantic consensus

Innovation

Methods, ideas, or system contributions that make the work stand out.

Mode Extraction

evaluator-free selection

spectral clustering