🤖 AI Summary
Large language models (LLMs) often underperform on unlabeled, out-of-distribution, and structurally novel reasoning tasks. To address this, we propose a validator-driven test-time training (TTT) framework. Our method employs a lightweight validator to score candidate responses, dynamically selecting high-confidence pseudo-labeled samples for online, unsupervised adaptation via low-rank LoRA fine-tuning only—eliminating the need for full-parameter updates or human annotations. This enables efficient, continuous, and resource-light model self-improvement. Compared to full-parameter TTT, our approach significantly reduces computational overhead; unlike conventional validator-based methods, it avoids static evaluation and manual labeling. Evaluated across three benchmarks and three state-of-the-art LLMs, our framework achieves up to 32.29% absolute improvement over baselines and 6.66% over validator-only methods without TTT, while converging faster and requiring fewer resources.
📝 Abstract
Learning to adapt pretrained language models to unlabeled, out-of-distribution data is a critical challenge, as models often falter on structurally novel reasoning tasks even while excelling within their training distribution. We introduce a new framework called VDS-TTT - Verifier-Driven Sample Selection for Test-Time Training to efficiently address this. We use a learned verifier to score a pool of generated responses and select only from high ranking pseudo-labeled examples for fine-tuned adaptation. Specifically, for each input query our LLM generates N candidate answers; the verifier assigns a reliability score to each, and the response with the highest confidence and above a fixed threshold is paired with its query for test-time training. We fine-tune only low-rank LoRA adapter parameters, ensuring adaptation efficiency and fast convergence. Our proposed self-supervised framework is the first to synthesize verifier driven test-time training data for continuous self-improvement of the model. Experiments across three diverse benchmarks and three state-of-the-art LLMs demonstrate that VDS-TTT yields up to a 32.29% relative improvement over the base model and a 6.66% gain compared to verifier-based methods without test-time training, highlighting its effectiveness and efficiency for on-the-fly large language model adaptation.