Neural-Symbolic Collaborative Distillation: Advancing Small Language Models for Complex Reasoning Tasks

📅 2024-09-20
🏛️ arXiv.org
📈 Citations: 2
Influential: 0
📄 PDF
🤖 AI Summary
Small language models (SLMs, ≤7B parameters) exhibit limited performance on complex reasoning tasks requiring sparse domain expertise. Method: This paper proposes a neuro-symbolic collaborative distillation framework that decouples and transfers general cognitive capabilities from large language model (LLM) teachers and specialized knowledge via two parallel pathways: (i) a neural pathway distilling generic reasoning abilities, and (ii) a symbolic pathway constructing an interpretable, editable structured knowledge base using logic rules and knowledge graphs—enabling human-in-the-loop refinement. The framework integrates knowledge distillation, symbolic logic representation, knowledge graph construction, and lightweight fine-tuning. Results: On BBH and GSM8K benchmarks, LLaMA3-8B and Qwen2-7B trained with our method significantly outperform GPT-3.5-turbo and approach the performance of LLaMA3-70B—despite using only 1/9 of its parameters.

Technology Category

Application Category

📝 Abstract
In this paper, we propose $ extbf{Ne}$ural-$ extbf{Sy}$mbolic $ extbf{C}$ollaborative $ extbf{D}$istillation ($ extbf{NesyCD}$), a novel knowledge distillation method for learning the complex reasoning abilities of Large Language Models (LLMs, e.g., extgreater 13B). We argue that complex reasoning tasks are difficult for Small Language Models (SLMs, e.g., $leq$ 7B), as these tasks demand not only general cognitive abilities but also specialized knowledge, which is often sparse and difficult for these neural-based SLMs to effectively capture. Therefore, NesyCD distills the general capabilities and specialized knowledge in LLMs using different manners. On the one hand, we distill only general abilities from teacher LLMs into the student SLMs of parameterized neural networks. On the other hand, for the specialized abilities and uncommon knowledge of a complex reasoning task, we employ a symbolic knowledge distillation approach to obtain and store the specialized knowledge within a symbolic knowledge base (KB). By decoupling general and specialized capabilities, the proposed NesyCD can achieve superior performance cost-effectively, utilizing smaller models and blending parameterized neural networks with symbolic KB. Moreover, the specialized KB generalizes well and is comprehended and manipulated by humans. Our experiments show that NesyCD significantly boosts SLMs' complex reasoning performance on in-domain (BBH, GSM8K) and out-of-domain (AGIEval, ARC) datasets. Notably, our approach enabled the LLaMA3-8B and Qwen2-7B to surpass GPT-3.5-turbo in performance and come close to matching LLaMA3-70B, despite the latter having nine times more parameters. Our code will be available at https://github.com/Xnhyacinth/NesyCD.
Problem

Research questions and friction points this paper is trying to address.

Enhance small models' complex reasoning
Distill general and specialized knowledge
Merge neural networks with symbolic KB
Innovation

Methods, ideas, or system contributions that make the work stand out.

Neural-symbolic collaborative distillation method
Decouples general and specialized capabilities
Combines neural networks with symbolic knowledge base
🔎 Similar Papers
No similar papers found.
Huanxuan Liao
Huanxuan Liao
Institute of Automation, Chinese Academy of Sciences
Natural Language ProcessingLarge Language ModelLong Context Modeling
S
Shizhu He
The Laboratory of Cognition and Decision Intelligence for Complex Systems, Institute of Automation, Chinese Academy of Sciences, Beijing, China; School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, China
Y
Yao Xu
The Laboratory of Cognition and Decision Intelligence for Complex Systems, Institute of Automation, Chinese Academy of Sciences, Beijing, China; School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, China
Yuanzhe Zhang
Yuanzhe Zhang
Institute of Automation, Chinese Academy of Sciences
Natural Language Processing
K
Kang Liu
The Laboratory of Cognition and Decision Intelligence for Complex Systems, Institute of Automation, Chinese Academy of Sciences, Beijing, China; School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, China
J
Jun Zhao
The Laboratory of Cognition and Decision Intelligence for Complex Systems, Institute of Automation, Chinese Academy of Sciences, Beijing, China; School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, China