Evolutionary System 2 Reasoning: An Empirical Proof

📅 2025-12-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work investigates whether large language models (LLMs) can spontaneously acquire human-like System 2 reasoning—characterized by deliberate, sequential, and reflective inference—through evolutionary mechanisms, rather than merely improving task-specific performance. Method: We propose Evolutionary Reasoning Optimization (ERO), a framework that maintains an LLM population, employs a quantitative reasoning score as the fitness metric, and iteratively optimizes model parameters via evolutionary operators (selection, mutation, crossover). Contribution/Results: Experiments reveal that mainstream LLMs exhibit weak System 2 reasoning capabilities; remarkably, weaker models (e.g., Qwen-7B) rapidly develop strong reasoning ability after only a few ERO cycles. On multiple standard reasoning benchmarks (e.g., GSM8K, MMLU, ARC), ERO enables low-parameter models to achieve high reasoning performance—demonstrating a leapfrog improvement. Crucially, this is the first empirical validation of an unsupervised, task-agnostic pathway for reasoning capability evolution, establishing a novel paradigm for studying general intelligence emergence.

Technology Category

Application Category

📝 Abstract
Machine intelligence marks the ultimate dream of making machines' intelligence comparable to human beings. While recent progress in Large Language Models (LLMs) show substantial specific skills for a wide array of downstream tasks, they more or less fall shorts in general intelligence. Following correlation between intelligence and system 2 reasoning (slow thinking), in this paper, we aim to answering a worthwhile research question: could machine intelligence such as LLMs be evolved to acquire reasoning ability (not specific skill) just like our human beings? To this end, we propose evolutionary reasoning optimization (ERO) framework which performs survival of the fittest over a population of LLMs to search for individual with strong reasoning ability. Given a reasoning task, ERO first initializes multiple LLMs as a population, after which an evolutionary strategy evolves the population to maximize quantified reasoning score of the best individual. Based on experiments on representative testsuites, we claim two surprising empirical discoveries: i) the latest LLMs such as GPT-5 still show limited system 2 reasoning ability; ii) with simple evolution-loop of ERO, a relatively weak model (Qwen-7B) could be enhanced to emerge powerful reasoning ability. Our project can be accessed at https://github.com/MetaEvo/ERO for reproduction needs.
Problem

Research questions and friction points this paper is trying to address.

Evolve LLMs to acquire human-like reasoning ability
Enhance weak models with evolutionary reasoning optimization
Address limited system 2 reasoning in advanced LLMs
Innovation

Methods, ideas, or system contributions that make the work stand out.

Evolutionary reasoning optimization framework for LLMs
Survival of the fittest strategy to enhance reasoning
Population-based evolution to maximize reasoning scores
🔎 Similar Papers
No similar papers found.
Zeyuan Ma
Zeyuan Ma
South China University of Technology
Meta-Black-Box OptimizationReinforcement LearningLearning to Optimize
Wenqi Huang
Wenqi Huang
Technical University of Munich
Image ReconstructionMagnetic Resonance ImagingImplicit Neural Representations
G
Guo-Huan Song
Zhejiang Normal University, Northern Computility
H
Hongshu Guo
South China University of Technology, Panorama Optimization
S
Sijie Ma
South China University of Technology
Zhiguang Cao
Zhiguang Cao
Singapore Management University
Learning to OptimizeNeural Combinatorial OptimizationComputational Intelligence
Y
Yue-Jiao Gong
South China University of Technology