🤖 AI Summary
Small language models (SLMs, ≤7B parameters) exhibit limited performance on complex reasoning tasks requiring sparse domain expertise.
Method: This paper proposes a neuro-symbolic collaborative distillation framework that decouples and transfers general cognitive capabilities from large language model (LLM) teachers and specialized knowledge via two parallel pathways: (i) a neural pathway distilling generic reasoning abilities, and (ii) a symbolic pathway constructing an interpretable, editable structured knowledge base using logic rules and knowledge graphs—enabling human-in-the-loop refinement. The framework integrates knowledge distillation, symbolic logic representation, knowledge graph construction, and lightweight fine-tuning.
Results: On BBH and GSM8K benchmarks, LLaMA3-8B and Qwen2-7B trained with our method significantly outperform GPT-3.5-turbo and approach the performance of LLaMA3-70B—despite using only 1/9 of its parameters.
📝 Abstract
In this paper, we propose $ extbf{Ne}$ural-$ extbf{Sy}$mbolic $ extbf{C}$ollaborative $ extbf{D}$istillation ($ extbf{NesyCD}$), a novel knowledge distillation method for learning the complex reasoning abilities of Large Language Models (LLMs, e.g., extgreater 13B). We argue that complex reasoning tasks are difficult for Small Language Models (SLMs, e.g., $leq$ 7B), as these tasks demand not only general cognitive abilities but also specialized knowledge, which is often sparse and difficult for these neural-based SLMs to effectively capture. Therefore, NesyCD distills the general capabilities and specialized knowledge in LLMs using different manners. On the one hand, we distill only general abilities from teacher LLMs into the student SLMs of parameterized neural networks. On the other hand, for the specialized abilities and uncommon knowledge of a complex reasoning task, we employ a symbolic knowledge distillation approach to obtain and store the specialized knowledge within a symbolic knowledge base (KB). By decoupling general and specialized capabilities, the proposed NesyCD can achieve superior performance cost-effectively, utilizing smaller models and blending parameterized neural networks with symbolic KB. Moreover, the specialized KB generalizes well and is comprehended and manipulated by humans. Our experiments show that NesyCD significantly boosts SLMs' complex reasoning performance on in-domain (BBH, GSM8K) and out-of-domain (AGIEval, ARC) datasets. Notably, our approach enabled the LLaMA3-8B and Qwen2-7B to surpass GPT-3.5-turbo in performance and come close to matching LLaMA3-70B, despite the latter having nine times more parameters. Our code will be available at https://github.com/Xnhyacinth/NesyCD.