🤖 AI Summary
Large language models (LLMs) often exhibit suboptimal performance in mathematical reasoning during inference, and existing approaches frequently rely on costly fine-tuning or external supervision.
Method: This paper proposes a training-free test-time evolution framework: it parallelly generates a population of candidate solutions, applies a genetic algorithm paradigm—comprising selection, mutation, and iterative population update—and employs an LLM-driven “evolutionary prompt” to enable autonomous population dynamics, culminating in majority-voting for final answer aggregation.
Contributions/Results: It is the first work to systematically integrate genetic algorithms into LLM test-time reasoning. The method unifies mainstream test-time scaling techniques under a coherent framework and jointly designs population evolution mechanisms with convergence criteria. Evaluated on multiple mathematical reasoning benchmarks, it achieves state-of-the-art accuracy, reduces solution variance by 42%, and cuts per-query average API call cost by 35%.
📝 Abstract
Test-time scaling has emerged as a promising direction for enhancing the reasoning capabilities of Large Language Models in last few years. In this work, we propose Population-Evolve, a training-free method inspired by Genetic Algorithms to optimize LLM reasoning. Our approach maintains a dynamic population of candidate solutions for each problem via parallel reasoning. By incorporating an evolve prompt, the LLM self-evolves its population in all iterations. Upon convergence, the final answer is derived via majority voting. Furthermore, we establish a unification framework that interprets existing test-time scaling strategies through the lens of genetic algorithms. Empirical results demonstrate that Population-Evolve achieves superior accuracy with low performance variance and computational efficiency. Our findings highlight the potential of evolutionary strategies to unlock the reasoning power of LLMs during inference.