Erase to Improve: Erasable Reinforcement Learning for Search-Augmented LLMs

📅 2025-10-01
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Search-augmented large language models (LLMs) suffer from insufficient robustness in multi-hop reasoning due to decomposition errors, retrieval failures, and inference mistakes. To address this, we propose EraseRL, a novel erasable reinforcement learning framework. EraseRL introduces, for the first time, a dynamic “identify–erase–locally regenerate” mechanism within the reasoning chain, enabling real-time localization and correction of erroneous reasoning steps to halt error propagation—thereby shifting the paradigm from fragile to resilient reasoning. The method integrates search-augmented architecture with end-to-end reinforcement learning, optimizing full-chain reliability without requiring additional human annotations. Evaluated on HotpotQA and MuSiQue benchmarks, EraseRL improves exact match (EM) by +8.48% and +5.38%, and F1 by +11.56% and +7.22%, respectively, for 3B- and 7B-scale models—significantly surpassing state-of-the-art approaches.

Technology Category

Application Category

📝 Abstract
While search-augmented large language models (LLMs) exhibit impressive capabilities, their reliability in complex multi-hop reasoning remains limited. This limitation arises from three fundamental challenges: decomposition errors, where tasks are incorrectly broken down; retrieval missing, where key evidence fails to be retrieved; and reasoning errors, where flawed logic propagates through the reasoning chain. A single failure in any of these stages can derail the final answer. We propose Erasable Reinforcement Learning (ERL), a novel framework that transforms fragile reasoning into a robust process. ERL explicitly identifies faulty steps, erases them, and regenerates reasoning in place, preventing defective logic from propagating through the reasoning chain. This targeted correction mechanism turns brittle reasoning into a more resilient process. Models trained with ERL, termed ESearch, achieve substantial improvements on HotpotQA, MuSiQue, 2Wiki, and Bamboogle, with the 3B model achieving +8.48% EM and +11.56% F1, and the 7B model achieving +5.38% EM and +7.22% F1 over previous state-of-the-art(SOTA) results. These findings suggest that erasable reinforcement learning provides a powerful paradigm shift for robust multi-step reasoning in LLMs.
Problem

Research questions and friction points this paper is trying to address.

Addresses decomposition errors in multi-hop reasoning
Solves retrieval missing of key evidence in LLMs
Corrects reasoning errors to prevent flawed logic propagation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Erasable Reinforcement Learning identifies faulty reasoning steps
It erases and regenerates defective logic in reasoning chains
Targeted correction prevents error propagation in multi-hop reasoning
Z
Ziliang Wang
ImVision
K
Kang An
ImVision, Shenzhen University
X
Xuhui Zheng
ImVision, Nanjing University
F
Faqiang Qian
ImVision
Weikun Zhang
Weikun Zhang
ImVision
C
Cijun Ouyang
ImVision
J
Jialu Cai
ImVision
Y
Yuhang Wang
ImVision
Yichao Wu
Yichao Wu
SenseTime Group Limited
AGILLMComputer VisionFace recognition