CoT-Self-Instruct: Building high-quality synthetic prompts for reasoning and non-reasoning tasks

📅 2025-07-31

📈 Citations: 0

✨ Influential: 0

career value

169K/year

🤖 AI Summary

This work addresses two key challenges in synthetic instruction data construction: the difficulty of generating high-quality data and the trade-off between reasoning-intensive and non-reasoning tasks. To this end, we propose CoT-Self-Instruct—a novel framework that unifies chain-of-thought (CoT) reasoning with the Self-Instruct paradigm. It guides large language models to perform structured reasoning planning on seed tasks, enabling controllable complexity and semantic richness in generated instructions, and incorporates automated quality filtering metrics for dynamic data curation. Empirically, CoT-Self-Instruct consistently improves performance across both reasoning benchmarks (e.g., MATH500, AMC23) and instruction-following benchmarks (e.g., AlpacaEval 2.0, Arena-Hard), significantly outperforming prior synthetic datasets (e.g., s1k, OpenMathReasoning) as well as human-authored instruction data. These results validate the effectiveness and generalizability of reasoning-guided synthetic data generation for instruction tuning.

Technology Category

Application Category

📝 Abstract

We propose CoT-Self-Instruct, a synthetic data generation method that instructs LLMs to first reason and plan via Chain-of-Thought (CoT) based on the given seed tasks, and then to generate a new synthetic prompt of similar quality and complexity for use in LLM training, followed by filtering for high-quality data with automatic metrics. In verifiable reasoning, our synthetic data significantly outperforms existing training datasets, such as s1k and OpenMathReasoning, across MATH500, AMC23, AIME24 and GPQA-Diamond. For non-verifiable instruction-following tasks, our method surpasses the performance of human or standard self-instruct prompts on both AlpacaEval 2.0 and Arena-Hard.

Problem

Research questions and friction points this paper is trying to address.

Generating high-quality synthetic prompts for reasoning tasks

Improving LLM training with automatic data filtering

Enhancing performance on verifiable and non-verifiable tasks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Generates synthetic prompts via Chain-of-Thought reasoning

Filters high-quality data with automatic metrics

Outperforms human and standard self-instruct prompts

🔎 Similar Papers

No similar papers found.

Nvidia

30 USD - 94 USD

US, CA, Santa Clara

Research Engineer, Post-Training - Meta Superintelligence Labs