π€ AI Summary
This work addresses the inefficiency of current large language models in synthetic data generation, which typically produce full outputs before filtering out low-quality samples, resulting in substantial wasted tokens. The authors propose a Multi-Stage In-Flight Rejection (MSIFR) framework that, for the first time, formulates synthetic data generation as a sequential decision process. MSIFR introduces multiple checkpoints during generation, employing lightweight rule-based verifiers to terminate low-quality trajectories earlyβthose exhibiting arithmetic errors, hallucinations, or format violations. Grounded in martingale theory, MSIFR significantly reduces expected token consumption without introducing bias. Experiments across five instruction-tuned models and seven reasoning benchmarks demonstrate that MSIFR alone reduces token overhead by 11%β77%, and up to 78.2% when combined with early-exit strategies, while maintaining or even improving model accuracy.
π Abstract
While synthetic data generation with large language models (LLMs) is widely used in post-training pipelines, existing approaches typically generate full outputs before applying quality filters, leading to substantial token waste on samples that are ultimately discarded. To address this, we propose Multi-Stage In-Flight Rejection (MSIFR), a lightweight, training-free framework that detects and terminates low-quality generation trajectories at intermediate checkpoints before they reach full completion. MSIFR decomposes the generation process into sequential stages and applies fast rule-based validators to identify arithmetic inconsistencies, hallucination patterns, and formatting violations, enabling early rejection of faulty samples. We formalize in-flight rejection as a sequential decision process and show that any non-trivial discard policy reduces expected token consumption, with stage-wise savings increasing when rejection occurs earlier in the generation pipeline. We further demonstrate that conditional utility estimates form a martingale, ensuring that early, in-flight rejection does not bias the expected utility of retained samples. Across five instruction-tuned models and seven reasoning benchmarks, MSIFR reduces token consumption by 11%-77% as a standalone method, and up to 78.2% when combined with early-exit methods, while preserving or improving evaluation accuracy. These results confirm that MSIFR provides a practical mechanism for improving the efficiency of LLM-based synthetic data generation without additional training or architectural changes.