Self-Improving Language Models with Bidirectional Evolutionary Search

📅 2026-05-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current self-improvement methods for language models are hindered by sparse verification signals and candidate generation that relies solely on autoregressive expansion, making it difficult to effectively explore low-probability yet high-quality solution spaces. This work proposes a Bidirectional Evolutionary Search (BES) framework that innovatively incorporates evolutionary operators to transcend entropy-shell constraints: forward passes generate diverse candidates through trajectory recombination, while backward passes provide dense intermediate feedback via recursive subgoal decomposition, substantially reducing the sample complexity required to discover correct solutions. The approach achieves consistent performance gains on challenging tasks where mainstream post-training algorithms fail, outperforming existing open-source frameworks in both average and best-case reasoning performance across three open-ended problem-solving benchmarks.
📝 Abstract
Search has been proposed as an effective method for self-improving language models and agentic systems, both for post-training sample generation and for inference. However, widely used methods such as best-of-N sampling and tree search face two fundamental limitations: they are guided by sparse verification signals, and they construct candidates primarily through autoregressive expansion, restricting exploration to regions with substantial model probability mass. To address these, we propose Bidirectional Evolutionary Search (BES), a search framework that couples forward candidate evolution with backward goal decomposition. In the forward search, BES augments standard expansion with evolution operators that recombine partial trajectories to generate candidates that are difficult to obtain from a single model rollout. In the backward search, BES recursively decomposes the original task into checkable subgoals, producing dense intermediate feedback that guides forward search. We provide theoretical motivation showing that candidates generated by expansion-only search are confined to a narrow entropy shell while evolutionary operators can escape it, and that backward search can exponentially reduce the number of required samples to find a correct answer. Experiments show that on challenging post-training tasks where mainstream post-training algorithms fail to improve, BES enables consistent gains, and on three open problem solving benchmarks at inference time, BES outperforms existing open-source frameworks in both average and best-case performance. Code and trained models are available at https://github.com/Embodied-Minds-Lab/BES.
Problem

Research questions and friction points this paper is trying to address.

self-improving language models
search limitations
autoregressive expansion
sparse verification signals
exploration constraints
Innovation

Methods, ideas, or system contributions that make the work stand out.

Bidirectional Evolutionary Search
self-improving language models
evolutionary operators
goal decomposition
dense feedback
🔎 Similar Papers