🤖 AI Summary
Small language models (S-LLMs) exhibit weak logical reasoning in sequential decision-making tasks, relying solely on output imitation without interpretability or generalization.
Method: We propose a logic distillation framework that elevates knowledge distillation from the output level to the logical structure level. A large language model (LLM) functionally decomposes complex instructions into reusable, composable discrete functions, forming a structured function library. We further design a state-driven, per-function reasoning mechanism and employ function-trajectory-supervised fine-tuning to enable decoupled transfer of decision logic.
Contribution/Results: Experiments demonstrate that S-LLMs trained via our framework match or surpass LLMs in multi-turn decision-making tasks, with substantial improvements in reasoning robustness and cross-task generalization. The code and datasets are publicly released.
📝 Abstract
Large language models (LLMs) have garnered increasing attention owing to their powerful comprehension and generation capabilities. Generally, larger LLMs (L-LLMs) that require paid interfaces exhibit significantly superior performance compared to smaller LLMs (S-LLMs) that can be deployed on a variety of devices. Knowledge distillation (KD) aims to empower S-LLMs with the capabilities of L-LLMs, while S-LLMs merely mimic the outputs of L-LLMs, failing to get the powerful decision-making capability for new situations. Consequently, S-LLMs are helpless when it comes to continuous decision-making tasks that require logical reasoning. To tackle the identified challenges, we propose a novel framework called Logic Distillation (LD). Initially, LD employs L-LLMs to instantiate complex instructions into discrete functions and illustrates their usage to establish a function base. Subsequently, LD fine-tunes S-LLMs based on the function base to learn the logic employed by L-LLMs in decision-making. During testing, S-LLMs will yield decision-making outcomes, function by function, based on current states. Experiments demonstrate that with the assistance of LD, S-LLMs can achieve outstanding results in continuous decision-making tasks, comparable to, or even surpassing, those of L-LLMs. The code and data for the proposed method are provided for research purposes https://github.com/Anfeather/Logic-Distillation.