π€ AI Summary
Semi-supervised regression (SSR) suffers from two key challenges: high sensitivity to pseudo-label quality and a propensity for overfitting in direct regression. To address these, we propose DRILLβa novel framework that reformulates continuous regression as discrete distribution estimation and introduces a decoupled representation distillation architecture. DRILL employs a teacher-student joint learning paradigm with a decoupled distribution alignment mechanism: it separately aligns target and non-target bin distributions, thereby mitigating pseudo-label bias while preserving consistency regularization. This design enhances the stability and robustness of knowledge transfer. Crucially, DRILL performs end-to-end optimization of discrete distribution predictions without requiring post-hoc calibration. Extensive experiments across diverse benchmark datasets demonstrate that DRILL consistently outperforms state-of-the-art SSR methods, validating its strong generalization capability and superior performance.
π Abstract
Semi-supervised regression (SSR), which aims to predict continuous scores of samples while reducing reliance on a large amount of labeled data, has recently received considerable attention across various applications, including computer vision, natural language processing, and audio and medical analysis. Existing semi-supervised methods typically apply consistency regularization on the general regression task by generating pseudo-labels. However, these methods heavily rely on the quality of pseudo-labels, and direct regression fails to learn the label distribution and can easily lead to overfitting. To address these challenges, we introduce an end-to-end Decoupled Representation distillation framework (DRILL) which is specially designed for the semi-supervised regression task where we transform the general regression task into a Discrete Distribution Estimation (DDE) task over multiple buckets to better capture the underlying label distribution and mitigate the risk of overfitting associated with direct regression. Then we employ the Decoupled Distribution Alignment (DDA) to align the target bucket and non-target bucket between teacher and student on the distribution of buckets, encouraging the student to learn more robust and generalized knowledge from the teacher. Extensive experiments conducted on datasets from diverse domains demonstrate that the proposed DRILL has strong generalization and outperforms the competing methods.