Big Reasoning with Small Models: Instruction Retrieval at Inference Time

📅 2025-10-15
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Small language models (SLMs) exhibit limited capability in multi-step reasoning and domain-specific knowledge tasks. To address this, we propose a fine-tuning-free, inference-time instruction retrieval framework. Our method constructs a structured instruction library—generated by GPT-5—and dynamically retrieves semantically relevant instructions during inference based on question embeddings, explicitly guiding the model through complex reasoning rather than relying on implicit, generative pathways. The core innovation lies in decoupling external knowledge injection into a lightweight, plug-and-play instruction retrieval mechanism, thereby preserving data privacy, minimizing computational overhead, and enhancing environmental sustainability. Empirical evaluation demonstrates consistent improvements of +9.4% on MedQA, +7.9% on MMLU Law, and +5.1% on MathQA, validating the approach’s effectiveness, cross-domain generalizability, and deployment efficiency.

Technology Category

Application Category

📝 Abstract
Can we bring large-scale reasoning to local-scale compute? Small language models (SLMs) are increasingly attractive because they run efficiently on local hardware, offering strong privacy, low cost, and reduced environmental impact. Yet they often struggle with tasks that require multi-step reasoning or domain-specific knowledge. We address this limitation through instruction intervention at inference time, where an SLM retrieves structured reasoning procedures rather than generating them from scratch. Our method builds an Instruction Corpus by grouping similar training questions and creating instructions via GPT-5. During inference, the SLM retrieves the most relevant instructions and follows their steps. Unlike retrieval-augmented generation, which retrieves text passages, instruction retrieval gives the model structured guidance for reasoning. We evaluate this framework on MedQA (medical board exams), MMLU Professional Law, and MathQA using models from 3B to 14B parameters without any additional fine-tuning. Instruction retrieval yields consistent gains: 9.4% on MedQA, 7.9% on MMLU Law, and 5.1% on MathQA. Concise instructions outperform longer ones, and the magnitude of improvement depends strongly on model family and intrinsic reasoning ability.
Problem

Research questions and friction points this paper is trying to address.

Enhancing small models' reasoning without fine-tuning
Retrieving structured instructions for complex domain tasks
Improving multi-step reasoning on medical and legal questions
Innovation

Methods, ideas, or system contributions that make the work stand out.

Retrieves structured reasoning instructions at inference
Builds instruction corpus via GPT-5 grouping similar questions
Uses concise instructions without additional model fine-tuning
🔎 Similar Papers
2024-02-08International Conference on Machine LearningCitations: 6
2024-07-28International Joint Conference on Artificial IntelligenceCitations: 3
K
Kenan Alkiek
School of Information, University of Michigan, Ann Arbor, MI 48109, USA
David Jurgens
David Jurgens
Associate Professor, School of Information and Dept. of Computer Science, University of Michigan
Natural Language ProcessingComputational Social ScienceComputational Sociolinguistics
V
Vinod Vydiswaran
School of Information, University of Michigan, Ann Arbor, MI 48109, USA