A Survey of Reinforcement Learning for Large Language Models under Data Scarcity: Challenges and Solutions

📅 2026-04-19

📈 Citations: 0

✨ Influential: 0

career value

178K/year

🤖 AI Summary

This work addresses the data inefficiency of large language models (LLMs) in reinforcement learning (RL)-based post-training, which stems from the scarcity of high-quality supervision signals and limited generated experience. The study presents a systematic literature review and establishes, for the first time, a dedicated research framework for data-efficient RL with LLMs. It proposes a hierarchical taxonomy encompassing three complementary perspectives—data, training, and architecture—to structure the field. Through this systematic analysis, the paper clarifies the technical design space and evolutionary trajectory, offering a coherent conceptual foundation and a comprehensive roadmap to guide future research toward more efficient and scalable RL-based post-training methodologies for large language models.

Technology Category

Application Category

📝 Abstract

Reinforcement learning (RL) has emerged as a powerful post-training paradigm for enhancing the reasoning capabilities of large language models (LLMs). However, reinforcement learning for LLMs faces substantial data scarcity challenges, including the limited availability of high-quality external supervision and the constrained volume of model-generated experience. These limitations make data-efficient reinforcement learning a critical research direction. In this survey, we present the first systematic review of reinforcement learning for LLMs under data scarcity. We propose a bottom-up hierarchical framework built around three complementary perspectives: the data-centric perspective, the training-centric perspective, and the framework-centric perspective. We develop a taxonomy of existing methods, summarize representative approaches in each category, and analyze their strengths and limitations. Our taxonomy aims to provide a clear conceptual foundation for understanding the design space of data-efficient RL for LLMs and to guide researchers working in this emerging area. We hope this survey offers a comprehensive roadmap for future research and inspires new directions toward more efficient and scalable reinforcement learning post-training for LLMs.

Problem

Research questions and friction points this paper is trying to address.

Reinforcement Learning

Large Language Models

Data Scarcity

Post-training

Data Efficiency

Innovation

Methods, ideas, or system contributions that make the work stand out.

reinforcement learning

large language models

data scarcity