Beyond Distribution Estimation: Simplex Anchored Structural Inference Towards Universal Semi-Supervised Learning

📅 2026-05-08

📈 Citations: 0

✨ Influential: 0

career value

156K/year

🤖 AI Summary

This work addresses the challenge of semi-supervised learning in realistic settings where labeled data are scarce and the distribution of unlabeled data is unknown. Conventional approaches often suffer from erroneous pseudo-labels due to their reliance on distributional assumptions, leading to representation confusion. To overcome this, the authors propose SAGE, a novel method that uniquely integrates structural reasoning with a simplex anchoring mechanism. SAGE constructs structural consensus through high-order sample relationships and leverages equiangular tight frames within a simplex geometry to enforce inter-class representation separation. Furthermore, it introduces a distribution-agnostic pseudo-label weighting strategy and an auxiliary branch isolation mechanism to effectively filter reliable signals and suppress noise. Requiring no assumptions about the unlabeled data distribution, SAGE achieves an average accuracy gain of 8.52% across five standard benchmarks, significantly outperforming state-of-the-art methods and marking a notable advance in general-purpose semi-supervised learning.

📝 Abstract

Semi-supervised learning faces significant challenges in realistic scenarios where labeled data is scarce and unlabeled data follows unknown, arbitrary distributions. We formalize this critical yet under-explored paradigm as Universal Semi-supervised Learning (UniSSL). Existing methods typically leverage unlabeled data via pseudo-labeling. However, they often rely on the idealized assumption of a uniform unlabeled data distribution or require sufficient labeled data to estimate it. In the UniSSL setting, such dependencies lead to numerous erroneous pseudo-labels, thereby triggering representation confusion. Fortunately, we observe that inter-sample relations captured by representations are more reliable than pseudo-labels. Leveraging this insight, we shift our focus to representation-level structural inference to bypass distribution estimation. Accordingly, we propose Simplex Anchored Graph-state Equipartition (SAGE), which captures high-order inter-sample dependencies to establish structural consensus for guiding representation learning. Meanwhile, to mitigate representation confusion, we employ vectors that satisfy a simplex equiangular tight frame to serve as a coordinate frame for guiding inter-class representation separation. Finally, we introduce a weighting strategy based on distribution-agnostic metrics to prioritize reliable pseudo-labels and an auxiliary branch to isolate potentially erroneous pseudo-labels. Evaluations on five standard benchmarks show that SAGE consistently outperforms state-of-the-art methods, with an average accuracy gain of \textbf{8.52\%}.

Problem

Research questions and friction points this paper is trying to address.

Semi-supervised learning

Universal Semi-supervised Learning

Unlabeled data distribution

Pseudo-labeling

Representation confusion

Innovation

Methods, ideas, or system contributions that make the work stand out.

Universal Semi-supervised Learning

Structural Inference

Simplex Equiangular Tight Frame