Learning from Random Subspace Exploration: Generalized Test-Time Augmentation with Self-supervised Distillation

📅 2025-07-02

📈 Citations: 0

✨ Influential: 0

career value

189K/year

🤖 AI Summary

Existing test-time adaptation (TTA) methods suffer from limited generalizability and struggle to jointly support both vision and non-vision tasks. To address this, we propose Generalized Test-Time Adaptation (GTTA), a unified framework that applies random perturbations within a PCA subspace to construct robust test-time ensembles, coupled with self-supervised knowledge distillation—where ensemble predictions guide lightweight single-model training with negligible inference overhead. GTTA is the first TTA framework to achieve cross-modal generalization across images, speech, and tabular data, while natively supporting diverse tasks including classification, regression, semantic segmentation, and object detection. Extensive experiments demonstrate state-of-the-art performance on benchmarks spanning image classification, semantic segmentation, automatic speech recognition, and house price prediction—outperforming existing TTA approaches and prior SOTA models. We further validate GTTA in a real-world application: low-visibility underwater salmon detection, and publicly release DeepSalmon, a large-scale, real-scenario dataset.

Technology Category

Application Category

📝 Abstract

We introduce Generalized Test-Time Augmentation (GTTA), a highly effective method for improving the performance of a trained model, which unlike other existing Test-Time Augmentation approaches from the literature is general enough to be used off-the-shelf for many vision and non-vision tasks, such as classification, regression, image segmentation and object detection. By applying a new general data transformation, that randomly perturbs multiple times the PCA subspace projection of a test input, GTTA forms robust ensembles at test time in which, due to sound statistical properties, the structural and systematic noises in the initial input data is filtered out and final estimator errors are reduced. Different from other existing methods, we also propose a final self-supervised learning stage in which the ensemble output, acting as an unsupervised teacher, is used to train the initial single student model, thus reducing significantly the test time computational cost, at no loss in accuracy. Our tests and comparisons to strong TTA approaches and SoTA models on various vision and non-vision well-known datasets and tasks, such as image classification and segmentation, speech recognition and house price prediction, validate the generality of the proposed GTTA. Furthermore, we also prove its effectiveness on the more specific real-world task of salmon segmentation and detection in low-visibility underwater videos, for which we introduce DeepSalmon, the largest dataset of its kind in the literature.

Problem

Research questions and friction points this paper is trying to address.

Improving model performance with Generalized Test-Time Augmentation (GTTA)

Filtering noise and reducing errors via PCA subspace perturbation

Reducing computational cost with self-supervised student-teacher learning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Random PCA subspace perturbation for robust ensembles

Self-supervised distillation to reduce computational cost

Generalized Test-Time Augmentation for multiple tasks

🔎 Similar Papers

Dataset Distillation via Knowledge Distillation: Towards Efficient Self-Supervised Pre-Training of Deep Networks