SalaMAnder: Shapley-based Mathematical Expression Attribution and Metric for Chain-of-Thought Reasoning

📅 2025-09-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
While Chain-of-Thought (CoT) prompting significantly enhances large language models’ mathematical reasoning, its underlying mechanisms remain poorly understood. Method: We propose SalaMAnder, the first framework to integrate Shapley values into few-shot CoT reasoning, enabling fine-grained attribution over mathematical expressions. Based on this, we design CoSP—a novel, interpretable evaluation metric—and reduce computational complexity via hierarchical sampling. We further validate CoSP’s monotonic correlation with model performance using covariance analysis. Results: Extensive experiments across major LLMs and mathematical benchmarks (e.g., GSM8K, MATH) demonstrate that CoSP consistently reflects reasoning quality. It not only identifies the attributional basis for CoT’s effectiveness but also provides empirically verifiable theoretical grounding for prompt engineering. By unifying disparate interpretations of reasoning mechanisms, CoSP advances principled understanding of CoT.

Technology Category

Application Category

📝 Abstract
Chain-of-Thought (CoT) prompting enhances the math reasoning capability of large language models (LLMs) to a large margin. However, the mechanism underlying such improvements remains unexplored. In this paper, we present extbf{SalaMAnder} ( extbf{S}h extbf{a}p extbf{l}ey-b extbf{a}sed extbf{M}athematical Expression extbf{A}ttribution a extbf{nd} M extbf{e}t extbf{r}ic), a theoretically grounded methodology as well as a mathematically rigorous evaluation metric for quantifying component-level contributions in few-shot CoT reasoning. Concretely, we leverage the Shapley value for mathematical expression attribution and develop an efficient stratified sampling algorithm that significantly reduces the computational complexity. Besides, we develop the extbf{CoSP} ( extbf{C}ardinality extbf{o}f extbf{S}hapley extbf{P}ositives) metric through covariance analysis. Comprehensive validation across popular LLM models and diverse mathematical benchmarks demonstrates that the CoSP metric within our SalaMAnder framework exhibits a robust monotonic correlation with model performance, not only providing theoretical explanations for the empirical success of existing few-shot CoT but also establishing mathematically rigorous principles for prompt construction optimization. Furthermore, we verify the reliability of the explanation, based on which we unify the insights of previous work.
Problem

Research questions and friction points this paper is trying to address.

Quantifying component-level contributions in few-shot Chain-of-Thought reasoning
Developing mathematically rigorous evaluation metrics for CoT prompt optimization
Explaining the underlying mechanisms behind CoT reasoning improvements in LLMs
Innovation

Methods, ideas, or system contributions that make the work stand out.

Shapley value for mathematical expression attribution
Efficient stratified sampling algorithm reduces complexity
CoSP metric correlates with model performance
Yue Xin
Yue Xin
University of Maryland
educational gamesscience educationinterdisciplinary learningcomputational thinkingculturally responsive teaching
C
Chen Shen
Alibaba Cloud Computing
Shaotian Yan
Shaotian Yan
Alibaba Group
Machine LearningComputer VisionLarge Language Models
Xiaosong Yuan
Xiaosong Yuan
Jilin University | Alibaba Group
NLPLLMDeep Learning
Y
Yaoming Wang
Shanghai Jiao Tong University
X
Xiaofeng Zhang
Shanghai Jiao Tong University, Alibaba Cloud Computing
C
Chenxi Huang
Alibaba Cloud Computing
J
Jieping Ye
Alibaba Cloud Computing