RLAP: A Reinforcement Learning Enhanced Adaptive Planning Framework for Multi-step NLP Task Solving

📅 2025-05-17

📈 Citations: 0

✨ Influential: 0

career value

181K/year

🤖 AI Summary

Existing multi-step NLP planning methods either rely on predefined fixed pipelines or brute-force path enumeration, neglecting instance-level linguistic features and over-relying on large language models’ (LLMs) implicit reasoning capabilities—leading to poor generalization. Method: We propose RLAP, a reinforcement learning–based adaptive planning framework that formalizes multi-step tasks as Markov Decision Processes (MDPs). RLAP employs a lightweight Actor network to directly model Q-values over natural language sequences, enabling dynamic generation of optimal subtask orders. Crucially, it explicitly integrates an LLM as the MDP’s executor—making planning instance-aware and decoupling it from the LLM’s internal reasoning. Contribution/Results: Evaluated across three NLP task categories and multiple benchmarks, RLAP achieves an average 7.2% accuracy gain over strong baselines—including fixed-path and enumeration-based approaches—demonstrating superior generalization, robustness, and semantic awareness.

Technology Category

Application Category

📝 Abstract

Multi-step planning has been widely employed to enhance the performance of large language models (LLMs) on downstream natural language processing (NLP) tasks, which decomposes the original task into multiple subtasks and guide LLMs to solve them sequentially without additional training. When addressing task instances, existing methods either preset the order of steps or attempt multiple paths at each step. However, these methods overlook instances' linguistic features and rely on the intrinsic planning capabilities of LLMs to evaluate intermediate feedback and then select subtasks, resulting in suboptimal outcomes. To better solve multi-step NLP tasks with LLMs, in this paper we propose a Reinforcement Learning enhanced Adaptive Planning framework (RLAP). In our framework, we model an NLP task as a Markov decision process (MDP) and employ an LLM directly into the environment. In particular, a lightweight Actor model is trained to estimate Q-values for natural language sequences consisting of states and actions through reinforcement learning. Therefore, during sequential planning, the linguistic features of each sequence in the MDP can be taken into account, and the Actor model interacts with the LLM to determine the optimal order of subtasks for each task instance. We apply RLAP on three different types of NLP tasks and conduct extensive experiments on multiple datasets to verify RLAP's effectiveness and robustness.

Problem

Research questions and friction points this paper is trying to address.

Enhancing multi-step NLP task solving with adaptive planning

Addressing suboptimal outcomes from preset or multi-path planning methods

Incorporating linguistic features via reinforcement learning for subtask ordering

Innovation

Methods, ideas, or system contributions that make the work stand out.

Reinforcement Learning enhanced Adaptive Planning framework

Model NLP task as Markov decision process

Lightweight Actor model estimates Q-values

🔎 Similar Papers

StepTool: Enhancing Multi-Step Tool Usage in LLMs through Step-Grained Reinforcement Learning