Strategy Executability in Mathematical Reasoning: Leveraging Human-Model Differences for Effective Guidance

📅 2026-02-25

📈 Citations: 0

✨ Influential: 0

career value

180K/year

🤖 AI Summary

This work addresses the inconsistent efficacy of example-based prompting in mathematical reasoning, which persists even when examples are correct and relevant, varying significantly across problems and models. The study systematically identifies and formalizes the previously overlooked issue of “strategy executability,” highlighting a critical disconnect between human-authored and model-generated solutions in strategy utilization. To bridge this gap, the authors propose a source-aware selective strategy retrieval framework that dynamically integrates multi-source experiential signals—drawn from both human and model demonstrations—during inference to select strategies with high executability. Evaluated on benchmarks including AIME25 and Apex, the approach achieves accuracy gains of up to 13 and 5 percentage points over direct solving, standard in-context learning, and single-source prompting baselines.

Technology Category

Application Category

📝 Abstract

Example-based guidance is widely used to improve mathematical reasoning at inference time, yet its effectiveness is highly unstable across problems and models-even when the guidance is correct and problem-relevant. We show that this instability arises from a previously underexplored gap between strategy usage-whether a reasoning strategy appears in successful solutions-and strategy executability-whether the strategy remains effective when instantiated as guidance for a target model. Through a controlled analysis of paired human-written and model-generated solutions, we identify a systematic dissociation between usage and executability: human- and model-derived strategies differ in structured, domain-dependent ways, leading to complementary strengths and consistent source-dependent reversals under guidance. Building on this diagnosis, we propose Selective Strategy Retrieval (SSR), a test-time framework that explicitly models executability by selectively retrieving and combining strategies using empirical, multi-route, source-aware signals. Across multiple mathematical reasoning benchmarks, SSR yields reliable and consistent improvements over direct solving, in-context learning, and single-source guidance, improving accuracy by up to $+13$ points on AIME25 and $+5$ points on Apex for compact reasoning models. Code and benchmark are publicly available at: https://github.com/lwd17/strategy-execute-pipeline.

Problem

Research questions and friction points this paper is trying to address.

strategy executability

mathematical reasoning

example-based guidance

reasoning instability

human-model differences

Innovation

Methods, ideas, or system contributions that make the work stand out.

strategy executability

selective strategy retrieval

mathematical reasoning