🤖 AI Summary
This paper addresses predictor learning from unpaired input dataset X and output dataset Y under low supervision—i.e., with only a few labeled (X,Y) pairs—aiming for sparse, interpretable cross-domain alignment. Methodologically, it first clusters X and Y separately, then constructs a sparse inter-cluster bridging structure that explicitly models the marginal output distribution; this enables highly interpretable, cluster-level mappings using minimal paired samples. Unlike conventional semi-supervised or optimal transport approaches, the framework is model-agnostic, requires no joint X–Y representation learning, and enjoys both theoretical tractability and computational efficiency. Experiments demonstrate that it matches state-of-the-art performance in low-label regimes, exhibits strong generalization across domains, and its bridging structure facilitates intuitive causal attribution and diagnostic analysis.
📝 Abstract
We introduce Bridged Clustering, a semi-supervised framework to learn predictors from any unpaired input $X$ and output $Y$ dataset. Our method first clusters $X$ and $Y$ independently, then learns a sparse, interpretable bridge between clusters using only a few paired examples. At inference, a new input $x$ is assigned to its nearest input cluster, and the centroid of the linked output cluster is returned as the prediction $hat{y}$. Unlike traditional SSL, Bridged Clustering explicitly leverages output-only data, and unlike dense transport-based methods, it maintains a sparse and interpretable alignment. Through theoretical analysis, we show that with bounded mis-clustering and mis-bridging rates, our algorithm becomes an effective and efficient predictor. Empirically, our method is competitive with SOTA methods while remaining simple, model-agnostic, and highly label-efficient in low-supervision settings.