Didactic to Constructive: Turning Expert Solutions into Learnable Reasoning

📅 2026-02-02

📈 Citations: 0

✨ Influential: 0

career value

177K/year

🤖 AI Summary

Expert solutions, while pedagogically valuable, often contain implicit reasoning gaps and are scarce in quantity, making them difficult for large language models to imitate directly. To address this, this work proposes Distribution-Aligned Imitation Learning (DAIL), a two-stage approach that reconstructs expert solutions into detailed, learnable reasoning trajectories and incorporates contrastive learning to emphasize core problem-solving insights. DAIL is the first method to systematically mitigate the distributional discrepancy between expert demonstrations and model-generated reasoning. Requiring fewer than 1,000 expert examples, it achieves a 10–25% improvement in pass@k performance on Qwen2.5-Instruct and Qwen3, enhances reasoning efficiency by 2–4×, and demonstrates strong cross-domain generalization capabilities.

Technology Category

Application Category

📝 Abstract

Improving the reasoning capabilities of large language models (LLMs) typically relies either on the model's ability to sample a correct solution to be reinforced or on the existence of a stronger model able to solve the problem. However, many difficult problems remain intractable for even current frontier models, preventing the extraction of valid training signals. A promising alternative is to leverage high-quality expert human solutions, yet naive imitation of this data fails because it is fundamentally out of distribution: expert solutions are typically didactic, containing implicit reasoning gaps intended for human readers rather than computational models. Furthermore, high-quality expert solutions are expensive, necessitating generalizable sample-efficient training methods. We propose Distribution Aligned Imitation Learning (DAIL), a two-step method that bridges the distributional gap by first transforming expert solutions into detailed, in-distribution reasoning traces and then applying a contrastive objective to focus learning on expert insights and methodologies. We find that DAIL can leverage fewer than 1000 high-quality expert solutions to achieve 10-25% pass@k gains on Qwen2.5-Instruct and Qwen3 models, improve reasoning efficiency by 2x to 4x, and enable out-of-domain generalization.

Problem

Research questions and friction points this paper is trying to address.

expert solutions

reasoning capability

distribution gap

sample efficiency

large language models

Innovation

Methods, ideas, or system contributions that make the work stand out.

Distribution Aligned Imitation Learning

expert reasoning

in-distribution reasoning traces