The Path of Self-Evolving Large Language Models: Achieving Data-Efficient Learning via Intrinsic Feedback

📅 2025-10-03

📈 Citations: 0

✨ Influential: 0

career value

202K/year

🤖 AI Summary

To address the challenge of enhancing large language models’ reasoning capabilities under scarce labeled data, this paper proposes a low-data-reliance reinforcement learning framework for self-evolution. Methodologically, it introduces a self-aware difficulty prediction mechanism and a boundary-breaking strategy, enabling models to autonomously identify their capability limits and select appropriately challenging tasks. It integrates task self-generation, intrinsic self-evaluation feedback, and dynamic external data querying to establish a meta-cognitive training loop. Evaluated on nine reasoning benchmarks, the approach achieves an average relative performance improvement of 53.8% while introducing less than 1.2% additional labeled data—significantly advancing data-efficient reasoning optimization. The core contribution lies in formalizing model “self-awareness” as a learnable capacity for difficulty estimation and boundary transcendence, thereby enabling continual autonomous evolution in few-shot settings.

Technology Category

Application Category

📝 Abstract

Reinforcement learning (RL) has demonstrated potential in enhancing the reasoning capabilities of large language models (LLMs), but such training typically demands substantial efforts in creating and annotating data. In this work, we explore improving LLMs through RL with minimal data. Our approach alternates between the LLM proposing a task and then attempting to solve it. To minimize data dependency, we introduce two novel mechanisms grounded in self-awareness: (1) self-aware difficulty prediction, where the model learns to assess task difficulty relative to its own abilities and prioritize challenging yet solvable tasks, and (2) self-aware limit breaking, where the model recognizes when a task is beyond its capability boundary and proactively requests external data to break through that limit. Extensive experiments on nine benchmarks showing a 53.8% relative improvement with less than 1.2% extra data demonstrate the efficacy of self-aware RL and underscore the promise of self-evolving agent training.

Problem

Research questions and friction points this paper is trying to address.

Achieving data-efficient learning via intrinsic feedback

Improving LLMs through RL with minimal data

Reducing dependency on external data annotation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Self-aware difficulty prediction for task prioritization

Self-aware limit breaking to request external data

Alternating task proposal and solution for data efficiency

🔎 Similar Papers

No similar papers found.