π€ AI Summary
Existing knowledge distillation methods for multi-step retrieval-augmented language models overlook the heterogeneous information requirements across reasoning steps, resulting in inefficient cross-stage knowledge transfer. To address this, we propose a stepwise knowledge distillation framework that employs step-level supervision and difficulty-aware training to dynamically align the teacher and student modelsβ reasoning objectives and information granularity at each step, while seamlessly integrating multi-step retrieval-augmented architectures and question decomposition strategies. This work introduces the first fine-grained knowledge distillation method explicitly designed for multi-hop reasoning processes. Evaluated on multi-hop question answering benchmarks, our approach substantially outperforms conventional distillation baselines: an 8B student model achieves performance comparable to a 70B teacher model, and the framework demonstrates broad compatibility with diverse multi-step reasoning architectures.
π Abstract
Answering complex real-world questions requires step-by-step retrieval and integration of relevant information to generate well-grounded responses. However, existing knowledge distillation methods overlook the need for different reasoning abilities at different steps, hindering transfer in multi-step retrieval-augmented frameworks. To address this, we propose Stepwise Knowledge Distillation for Enhancing Reasoning Ability in Multi-Step Retrieval-Augmented Language Models (StepER). StepER employs step-wise supervision to align with evolving information and reasoning demands across stages. Additionally, it incorporates difficulty-aware training to progressively optimize learning by prioritizing suitable steps. Our method is adaptable to various multi-step retrieval-augmented language models, including those that use retrieval queries for reasoning paths or decomposed questions. Extensive experiments show that StepER outperforms prior methods on multi-hop QA benchmarks, with an 8B model achieving performance comparable to a 70B teacher model.