DIMSUM: Discourse in Mathematical Reasoning as a Supervision Module

📅 2025-03-06

📈 Citations: 0

✨ Influential: 0

career value

157K/year

🤖 AI Summary

This work challenges the prevailing assumption that performance gains of large language models (LLMs) on GSM8k stem from intrinsic improvements in mathematical reasoning capability, arguing instead that expanded pretraining data coverage is the primary driver. To address the generalization bottleneck—particularly for small-data or weakly trained models—the authors introduce discourse structure as a novel, lightweight, plug-and-play supervisory signal for mathematical reasoning. Methodologically, they construct structured prompts grounded in discourse analysis, integrate them with instruction tuning, and explicitly annotate reasoning paths. Evaluated on open-source models including Llama2-13b, the approach achieves a 160% accuracy improvement on GSM8k. It also significantly enhances out-of-distribution (OOD) robustness and yields consistent gains even on strongly overfitted models. The core contribution is a new, minimal-invasive reasoning augmentation paradigm grounded in discourse structure—offering an effective, modular, and broadly applicable alternative to conventional reasoning supervision.

Technology Category

Application Category

📝 Abstract

We look at reasoning on GSM8k, a dataset of short texts presenting primary school, math problems. We find, with Mirzadeh et al. (2024), that current LLM progress on the data set may not be explained by better reasoning but by exposure to a broader pretraining data distribution. We then introduce a novel information source for helping models with less data or inferior training reason better: discourse structure. We show that discourse structure improves performance for models like Llama2 13b by up to 160%. Even for models that have most likely memorized the data set, adding discourse structural information to the model still improves predictions and dramatically improves large model performance on out of distribution examples.

Problem

Research questions and friction points this paper is trying to address.

Explores reasoning in primary school math problems using GSM8k dataset.

Investigates if LLM progress is due to broader pretraining data exposure.

Introduces discourse structure to improve model reasoning with limited data.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses discourse structure for mathematical reasoning enhancement

Improves model performance by up to 160%

Enhances predictions on out-of-distribution examples

🔎 Similar Papers

BloomWise: Enhancing Problem-Solving capabilities of Large Language Models using Bloom's-Taxonomy-Inspired Prompts