A Diversity-Enhanced Knowledge Distillation Model for Practical Math Word Problem Solving

📅 2025-01-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing mathematical word problem (MWP) solvers—such as Seq2Seq, Seq2Tree, and Graph2Tree—suffer from limited solution diversity in equation generation, hindering generalization. To address this, we propose a diversity-enhanced knowledge distillation framework. Our method employs two key components: (1) an adaptive diversity-aware distillation mechanism that dynamically selects high-quality, semantically equivalent yet structurally diverse equations generated by the teacher model; and (2) a conditional variational autoencoder (CVAE) to explicitly model the prior distribution over diverse solution paths, thereby guiding the student model to explore a broader space of equivalent equations. Evaluated on four mainstream MWP benchmarks, our approach significantly outperforms strong baselines in answer accuracy while maintaining efficient inference, demonstrating practical deployability.

Technology Category

Application Category

📝 Abstract
Math Word Problem (MWP) solving is a critical task in natural language processing, has garnered significant research interest in recent years. Various recent studies heavily rely on Seq2Seq models and their extensions (e.g., Seq2Tree and Graph2Tree) to generate mathematical equations. While effective, these models struggle to generate diverse but counterpart solution equations, limiting their generalization across various math problem scenarios. In this paper, we introduce a novel Diversity-enhanced Knowledge Distillation (DivKD) model for practical MWP solving. Our approach proposes an adaptive diversity distillation method, in which a student model learns diverse equations by selectively transferring high-quality knowledge from a teacher model. Additionally, we design a diversity prior-enhanced student model to better capture the diversity distribution of equations by incorporating a conditional variational auto-encoder. Extensive experiments on {four} MWP benchmark datasets demonstrate that our approach achieves higher answer accuracy than strong baselines while maintaining high efficiency for practical applications.
Problem

Research questions and friction points this paper is trying to address.

Mathematical Problem Solving
Model Diversity
Equivalence Generation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Diverse Solution Learning
Conditional Variational Autoencoder
Enhanced Knowledge Distillation
🔎 Similar Papers
No similar papers found.
Y
Yi Zhang
Faculty of Artificial Intelligence in Education, Central China Normal University, Wuhan, 430079, Hubei, China; School of Information and Safety Engineering, Zhongnan University of Economics and Law, Wuhan, 430073, Hubei, China
Guangyou Zhou
Guangyou Zhou
Professor, Central China Normal University
Natural Language ProcessingKnowledge GraphDeep Learning
Zhiwen Xie
Zhiwen Xie
School of Computer Science, Wuhan University
NLP
J
Jinjin Ma
Faculty of Artificial Intelligence in Education, Central China Normal University, Wuhan, 430079, Hubei, China
J
J. Huang
Information Retrieval and Knowledge Management Research Lab, York University, Toronto, Canada