๐ค AI Summary
To address the high computational cost and self-assessment bias inherent in large language models (LLMs) performing self-improvement on unverifiable open-ended tasks (e.g., machine translation), this paper proposes a lightweight semantic voting method that eliminates explicit self-evaluation. The core innovation extends conventional hard-matching majority voting to soft voting based on semantic similarity, leveraging efficient sentence embedding models to generate high-quality pseudo-labelsโthereby avoiding the computational overhead and overconfidence issues associated with self-scoring and entropy minimization. Experiments across diverse model architectures and translation benchmarks demonstrate significant improvements: average inference speedup of 2.3ร, and gains of +1.8 BLEU and +2.4 COMET scores. Moreover, the method exhibits strong generalizability across tasks and models. This work establishes a more reliable and scalable unsupervised paradigm for LLM self-optimization.
๐ Abstract
The rising cost of acquiring supervised data has driven significant interest in self-improvement for large language models (LLMs). Straightforward unsupervised signals like majority voting have proven effective in generating pseudo-labels for verifiable tasks, while their applicability to unverifiable tasks (e.g., translation) is limited by the open-ended character of responses. As a result, self-evaluation mechanisms (e.g., self-judging and entropy minimization) are predominantly used to derive pseudo-labels. However, self-evaluation relying on LLMs typically incurs high computational overhead and introduces overconfidence issues due to intrinsic biases. To address these challenges, we propose a novel self-evaluation-free approach for unverifiable tasks, designed for lightweight yet effective self-improvement. Inspired by majority voting commonly employed in verifiable tasks, we propose semantic voting as a novel mechanism that relaxes the principle of hard matching (i.e., exact matching) toward soft matching (i.e., semantic similarity). Soft matching is achieved by leveraging a lightweight sentence embedding model to quantify semantic similarity, thereby mitigating excessive computational burden and intrinsic bias-associated limitations of self-evaluation. Comprehensive experiments demonstrate that our method achieves substantial gains in computational efficiency and overall better performance than self-evaluation methods across diverse model architectures and tasks.