🤖 AI Summary
Existing instruction tuning relies on manually annotated seed data or strong teacher models, while instruction back-translation remains constrained by the initial seed set, leading to error accumulation and inefficient utilization of unlabeled corpora. Method: We propose the first fully seed-free instruction tuning framework, built upon a dual-model architecture—answer generation and question generation—that establishes a bidirectional self-training loop. This loop leverages pseudo-label reconstruction, mutual supervision, and cycle-consistency constraints to automatically synthesize high-quality pseudo-instruction data from raw text alone. Contribution/Results: Our method eliminates dependence on human annotation and external models, avoids initialization bias, and substantially improves raw corpus utilization. On four benchmark task categories, it outperforms seed-based back-translation approaches and matches strong supervised baselines—demonstrating, for the first time, the feasibility and effectiveness of fully self-guided instruction tuning.
📝 Abstract
Instruction tuning is vital for aligning large language models (LLMs) with human intent, but current methods typically rely on costly human-annotated seed data or powerful external teacher models. While instruction back-translation techniques reduce this dependency, they remain fundamentally tethered to an initial seed set, which limits full automation, introduces biases, and can lead to inefficient use of unlabeled corpora. In this paper, we propose Cycle-Instruct, a novel framework that achieves fully seed-free instruction tuning. Inspired by cycle consistency, Cycle-Instruct employs a dual self-training loop where two models-an answer generator and a question generator-are bootstrapped solely from raw, unlabeled text. These models mutually supervise each other by reconstructing original text segments from their counterpart's generated pseudo-labels, effectively learning from the intrinsic structure of the data without any human-provided seeds. We demonstrate Cycle-Instruct's efficacy across four diverse data tracks, including general instruction-following, domain-specific tasks, dialogue logs, and plain text. Our extensive experiments show that Cycle-Instruct not only outperforms seed-driven back-translation baselines but also achieves performance comparable to strongly supervised methods.