Evolutionary System Prompt Learning can Facilitate Reinforcement Learning for LLMs

📅 2026-02-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work proposes Evolutionary System Prompt Learning (E-SPL), which for the first time integrates an evolutionary mechanism for system prompts with reinforcement learning to jointly optimize and enhance the autonomous learning and generalization capabilities of large language models (LLMs) in reasoning and agent tasks. Within each reinforcement learning iteration, E-SPL concurrently evaluates multiple system prompts, employs TrueSkill scoring to guide LLM-driven prompt mutation and crossover, and simultaneously updates model weights conditioned on the respective prompts—explicitly distinguishing declarative knowledge encoded in prompts from procedural knowledge embedded in model parameters. On the AIME→BeyondAIME generalization benchmark, E-SPL improves success rates from 38.8% to 45.1%, substantially outperforming a purely reflective prompt evolution baseline (40.0%) and demonstrating superior sample efficiency and generalization performance.

Technology Category

Application Category

📝 Abstract
Building agentic systems that can autonomously self-improve from experience is a longstanding goal of AI. Large language models (LLMs) today primarily self-improve via two mechanisms: self-reflection for context updates, and reinforcement learning (RL) for weight updates. In this work, we propose Evolutionary System Prompt Learning (E-SPL), a method for jointly improving model contexts and model weights. In each RL iteration, E-SPL selects multiple system prompts and runs rollouts with each in parallel. It applies RL updates to model weights conditioned on each system prompt, and evolutionary updates to the system prompt population via LLM-driven mutation and crossover. Each system prompt has a TrueSkill rating for evolutionary selection, updated from relative performance within each RL iteration batch. E-SPL encourages a natural division between declarative knowledge encoded in prompts and procedural knowledge encoded in weights, resulting in improved performance across reasoning and agentic tasks. For instance, in an easy-to-hard (AIME $\rightarrow$ BeyondAIME) generalization setting, E-SPL improves RL success rate from 38.8% $\rightarrow$ 45.1% while also outperforming reflective prompt evolution (40.0%). Overall, our results show that coupling reinforcement learning with system prompt evolution yields consistent gains in sample efficiency and generalization. Code: https://github.com/LunjunZhang/E-SPL
Problem

Research questions and friction points this paper is trying to address.

Evolutionary System Prompt Learning
Reinforcement Learning
Large Language Models
Self-improvement
Generalization
Innovation

Methods, ideas, or system contributions that make the work stand out.

Evolutionary System Prompt Learning
Reinforcement Learning
System Prompt Evolution
TrueSkill-based Selection
LLM-driven Mutation
🔎 Similar Papers
No similar papers found.
Lunjun Zhang
Lunjun Zhang
University of Toronto
Artificial intelligenceRobotics
Ryan Chen
Ryan Chen
Northwestern University
B
Bradly C. Stadie
Department of Statistics and Data Science, Northwestern University; Bridgewater AIA Labs