🤖 AI Summary
To address the scarcity of high-quality supervised fine-tuning (SFT) data—which critically limits large language model (LLM) alignment performance—this paper proposes Condor, a two-stage synthetic data generation framework. In Stage I, a world knowledge tree is constructed to encode structured prior knowledge, ensuring factual consistency. In Stage II, an LLM-driven self-reflection refinement mechanism iteratively optimizes and re-ranks instruction-response pairs. Condor pioneers a knowledge-driven synthetic paradigm, enabling continuous self-improvement across model scales from 7B to 72B parameters. When fine-tuning base models with only 20K Condor-generated samples, it surpasses multiple strong baselines across multi-dimensional alignment benchmarks—including helpfulness, honesty, and harmlessness—demonstrating both high-fidelity data synthesis and scalable alignment enhancement.
📝 Abstract
The quality of Supervised Fine-Tuning (SFT) data plays a critical role in enhancing the conversational capabilities of Large Language Models (LLMs). However, as LLMs become more advanced, the availability of high-quality human-annotated SFT data has become a significant bottleneck, necessitating a greater reliance on synthetic training data. In this work, we introduce Condor, a novel two-stage synthetic data generation framework that incorporates World Knowledge Tree and Self-Reflection Refinement to produce high-quality SFT data at scale. Our experimental results demonstrate that a base model fine-tuned on only 20K Condor-generated samples achieves superior performance compared to counterparts. The additional refinement stage in Condor further enables iterative self-improvement for LLMs at various scales (up to 72B), validating the effectiveness of our approach. Furthermore, our investigation into the scaling for synthetic data in post-training reveals substantial unexplored potential for performance improvements, opening promising avenues for future research.