🤖 AI Summary
In biology, the scarcity and high cost of wet-lab experimental labels severely constrain the construction of synthetic chain-of-thought (CoT) reasoning data. Method: We propose an unsupervised synthetic reasoning data filtering framework that requires no ground-truth labels. It innovatively leverages model-intrinsic uncertainty metrics—such as self-consistency and prediction perplexity—to dynamically weight and fuse multi-dimensional confidence signals, enabling class-adaptive selection of high-quality CoT trajectories. Results: The resulting synthetic dataset substantially improves performance on biological perturbation prediction: supervised fine-tuning achieves performance comparable to full supervised training with real labels and surpasses strong baselines. This work is the first to systematically introduce an uncertainty-driven, self-supervised filtering mechanism into biological reasoning data generation, establishing a scalable, low-cost data engineering paradigm for resource-constrained scientific AI.
📝 Abstract
Synthetic chain-of-thought (CoT) traces are widely used to train large reasoning models (LRMs), improving generalization by providing step-level supervision. Yet most approaches require ground-truth labels to seed or filter these traces - an expensive bottleneck in domains like biology where wet-lab data are scarce. We propose a label-free alternative: uncertainty-based filtering, which uses a model's own confidence - quantified through established uncertainty metrics like self-consistency and predictive perplexity - as a substitute for external labels. We sample multiple reasoning traces and retain only low-uncertainty subsets. Applied to biological perturbation prediction, a domain where wet-lab labels are especially costly, we show that the filtered subset has higher accuracy, and that supervised fine-tuning (SFT) on uncertainty-filtered data outperforms unfiltered synthetic data, narrows the gap to ground-truth training, and surpasses strong LRM baselines. Ablations show that per-class filtering corrects for class-specific uncertainty scales and that hybrid uncertainty metrics yield higher-quality datasets. Our results suggest that model-internal confidence is a powerful signal for efficient reasoning dataset creation, enabling LRMs in domains where supervision is expensive.