🤖 AI Summary
This work investigates how teacher model selection impacts the performance of open-source student models in reasoning-oriented knowledge distillation. Method: Leveraging 1.89 million queries, we construct three parallel distillation datasets and systematically evaluate output quality across state-of-the-art teachers—AM-Thinking-v1, Qwen3-235B-A22B, and DeepSeek-R1—revealing, for the first time, significant divergence in the quality of their reasoning traces—even when answers are correct. We propose filtering based on “high-quality verification reasoning traces” (beyond answer correctness), validated via distributional analysis, perplexity estimation, and adaptive response modeling. Contribution/Results: A student model trained on AM-Thinking-v1 distilled data achieves new SOTA across AIME2024 (84.3), AIME2025 (72.2), MATH500 (98.4), and LiveCodeBench (65.9), while demonstrating task-difficulty-driven adaptive response length.
📝 Abstract
Distillation has emerged as a practical and effective approach to enhance the reasoning capabilities of open-source language models. In this work, we conduct a large-scale empirical study on reasoning data distillation by collecting verified outputs from three state-of-the-art teacher models-AM-Thinking-v1, Qwen3-235B-A22B, and DeepSeek-R1-on a shared corpus of 1.89 million queries. We construct three parallel datasets and analyze their distributions, revealing that AM-Thinking-v1-distilled data exhibits greater token length diversity and lower perplexity. Student models trained on each dataset are evaluated on reasoning benchmarks including AIME2024, AIME2025, MATH500, and LiveCodeBench. The AM-based model consistently achieves the best performance (e.g., 84.3 on AIME2024, 72.2 on AIME2025, 98.4 on MATH500, and 65.9 on LiveCodeBench) and demonstrates adaptive output behavior-producing longer responses for harder tasks and shorter ones for simpler tasks. These findings highlight the value of high-quality, verified reasoning traces. We release the AM-Thinking-v1 and Qwen3-235B-A22B distilled datasets to support future research on open and high-performing reasoning-oriented language models. The datasets are publicly available on Hugging Facefootnote{Datasets are available on Hugging Face: href{https://huggingface.co/datasets/a-m-team/AM-Thinking-v1-Distilled}{AM-Thinking-v1-Distilled}, href{https://huggingface.co/datasets/a-m-team/AM-Qwen3-Distilled}{AM-Qwen3-Distilled}.}.