SCRIBE: Structured Chain Reasoning for Interactive Behaviour Explanations using Tool Calling

📅 2025-10-30
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Small language models (SLMs) struggle to deliver high-quality, personalized educational feedback under privacy-sensitive and resource-constrained settings. Method: We propose a multi-hop tool-augmented reasoning framework integrating structured chain-of-thought reasoning, self-reflection mechanisms, and domain-specific tool invocation—designed for local deployment. Our approach employs two-stage LoRA fine-tuning, leveraging GPT-4o-synthesized data to distill open-weight 3B/8B models, while enhancing knowledge alignment and error recovery capabilities. Results: The 8B model achieves feedback relevance and actionability on par with GPT-4o and Llama-3.3 70B; user studies confirm strong student acceptance of its pedagogical feedback quality. This work pioneers the first lightweight, explainable, interactive, and robust educational QA system—establishing a new paradigm for privacy-preserving AI in education.

Technology Category

Application Category

📝 Abstract
Language models can be used to provide interactive, personalized student feedback in educational settings. However, real-world deployment faces three key challenges: privacy concerns, limited computational resources, and the need for pedagogically valid responses. These constraints require small, open-source models that can run locally and reliably ground their outputs in correct information. We introduce SCRIBE, a framework for multi-hop, tool-augmented reasoning designed to generate valid responses to student questions about feedback reports. SCRIBE combines domain-specific tools with a self-reflective inference pipeline that supports iterative reasoning, tool use, and error recovery. We distil these capabilities into 3B and 8B models via two-stage LoRA fine-tuning on synthetic GPT-4o-generated data. Evaluation with a human-aligned GPT-Judge and a user study with 108 students shows that 8B-SCRIBE models achieve comparable or superior quality to much larger models in key dimensions such as relevance and actionability, while being perceived on par with GPT-4o and Llama-3.3 70B by students. These findings demonstrate the viability of SCRIBE for low-resource, privacy-sensitive educational applications.
Problem

Research questions and friction points this paper is trying to address.

Addressing privacy concerns in educational feedback systems
Overcoming computational limitations for local model deployment
Ensuring pedagogical validity in automated student responses
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-hop tool-augmented reasoning for explanations
Two-stage LoRA fine-tuning on synthetic data
Self-reflective pipeline with error recovery
🔎 Similar Papers
No similar papers found.