ThetaEvolve: Test-time Learning on Open Problems

📅 2025-11-28

📈 Citations: 0

✨ Influential: 0

career value

196K/year

🤖 AI Summary

This work addresses open mathematical optimization problems—such as π-packing and autocorrelation inequalities—by proposing a test-time learning framework tailored for small, open-source language models. The framework enables models to autonomously evolve and surpass established optimal bounds during inference via contextual reasoning and reinforcement learning (RL). Methodologically, it introduces the first online, self-directed program search capability for small models without external fine-tuning, incorporating lazy penalty regularization, batched sampling, and reward shaping to significantly improve exploration efficiency and training stability. Experiments demonstrate that the framework achieves new best-known bounds across multiple benchmark tasks; its RL-driven policy consistently outperforms pure reasoning baselines and exhibits cross-task generalization. Overall, this work establishes a scalable paradigm for lightweight models to autonomously discover novel mathematical solutions.

Technology Category

Application Category

📝 Abstract

Recent advances in large language models (LLMs) have enabled breakthroughs in mathematical discovery, exemplified by AlphaEvolve, a closed-source system that evolves programs to improve bounds on open problems. However, it relies on ensembles of frontier LLMs to achieve new bounds and is a pure inference system that models cannot internalize the evolving strategies. We introduce ThetaEvolve, an open-source framework that simplifies and extends AlphaEvolve to efficiently scale both in-context learning and Reinforcement Learning (RL) at test time, allowing models to continually learn from their experiences in improving open optimization problems. ThetaEvolve features a single LLM, a large program database for enhanced exploration, batch sampling for higher throughput, lazy penalties to discourage stagnant outputs, and optional reward shaping for stable training signals, etc. ThetaEvolve is the first evolving framework that enable a small open-source model, like DeepSeek-R1-0528-Qwen3-8B, to achieve new best-known bounds on open problems (circle packing and first auto-correlation inequality) mentioned in AlphaEvolve. Besides, across two models and four open tasks, we find that ThetaEvolve with RL at test-time consistently outperforms inference-only baselines, and the model indeed learns evolving capabilities, as the RL-trained checkpoints demonstrate faster progress and better final performance on both trained target task and other unseen tasks. We release our code publicly: https://github.com/ypwang61/ThetaEvolve

Problem

Research questions and friction points this paper is trying to address.

Develops an open-source framework for evolving programs to solve open optimization problems

Enables test-time learning via in-context learning and reinforcement learning for continual improvement

Allows smaller models to achieve new best-known bounds on open mathematical problems

Innovation

Methods, ideas, or system contributions that make the work stand out.

Open-source framework simplifies AlphaEvolve for test-time learning

Uses single LLM with program database and batch sampling for efficiency

Enables small models to achieve new bounds via reinforcement learning

🔎 Similar Papers

BOWL: A Deceptively Simple Open World Learner