🤖 AI Summary
This work identifies and theoretically analyzes domain-sensitive collapse (DSC)—a phenomenon wherein models trained on a single domain exhibit feature collapse into low-rank class subspaces, leading to failure in out-of-distribution (OOD) detection. To mitigate this, the authors propose Teacher-Guided Training (TGT), which leverages a frozen multi-domain pretrained teacher model (DINOv2) during training. TGT employs an auxiliary head to distill residual structures suppressed by class supervision, thereby recovering directions sensitive to domain shifts. Notably, TGT introduces no additional inference overhead and achieves substantial improvements across eight single-domain benchmarks, reducing far-OOD false positive rates by over 10 percentage points on average (measured by FPR@95 for ResNet-50), while maintaining or slightly enhancing in-domain OOD detection performance and classification accuracy.
📝 Abstract
Out-of-distribution (OOD) detection methods perform well on multi-domain benchmarks, yet many practical systems are trained on single-domain data. We show that this regime induces a geometric failure mode, Domain-Sensitivity Collapse (DSC): supervised training compresses features into a low-rank class subspace and suppresses directions that carry domain-shift signal. We provide theory showing that, under DSC, distance- and logit-based OOD scores lose sensitivity to domain shift. We then introduce Teacher-Guided Training (TGT), which distills class-suppressed residual structure from a frozen multi-domain teacher (DINOv2) into the student during training. The teacher and auxiliary head are discarded after training, adding no inference overhead. Across eight single-domain benchmarks, TGT yields large far-OOD FPR@95 reductions for distance-based scorers: MDS improves by 11.61 pp, ViM by 10.78 pp, and kNN by 12.87 pp (ResNet-50 average), while maintaining or slightly improving in-domain OOD and classification accuracy.