Stop Rewarding Hallucinated Steps: Faithfulness-Aware Step-Level Reinforcement Learning for Small Reasoning Models

📅 2026-02-05

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work addresses the challenge of factual hallucinations in intermediate reasoning steps of small language models under resource-constrained settings, where conventional outcome-based reinforcement learning may erroneously reinforce unfaithful reasoning paths due to correct final answers. To mitigate this issue, the authors propose FaithRL, a novel approach that introduces step-level faithfulness supervision into reinforcement learning for the first time. FaithRL employs a process reward model to deliver explicit step-level faithfulness rewards and incorporates a truncation-based resampling strategy to generate implicit contrastive signals, thereby guiding the model to learn from faithful reasoning prefixes. Experimental results demonstrate that FaithRL significantly reduces hallucinations in both reasoning chains and final answers across multiple small language models and open-domain question answering benchmarks, effectively enhancing the faithfulness and reliability of model reasoning.

Technology Category

Application Category

📝 Abstract

As large language models become smaller and more efficient, small reasoning models (SRMs) are crucial for enabling chain-of-thought (CoT) reasoning in resource-constrained settings. However, they are prone to faithfulness hallucinations, especially in intermediate reasoning steps. Existing mitigation methods based on online reinforcement learning rely on outcome-based rewards or coarse-grained CoT evaluation, which can inadvertently reinforce unfaithful reasoning when the final answer is correct. To address these limitations, we propose Faithfulness-Aware Step-Level Reinforcement Learning (FaithRL), introducing step-level supervision via explicit faithfulness rewards from a process reward model, together with an implicit truncated resampling strategy that generates contrastive signals from faithful prefixes. Experiments across multiple SRMs and Open-Book QA benchmarks demonstrate that FaithRL consistently reduces hallucinations in both the CoT and final answers, leading to more faithful and reliable reasoning. Code is available at https://github.com/Easy195/FaithRL.

Problem

Research questions and friction points this paper is trying to address.

faithfulness hallucinations

small reasoning models

chain-of-thought

step-level reasoning

reinforcement learning

Innovation

Methods, ideas, or system contributions that make the work stand out.

step-level reinforcement learning

faithfulness-aware

hallucination mitigation

chain-of-thought reasoning

small reasoning models

🔎 Similar Papers

No similar papers found.

Authors to Follow