Bridged Clustering for Representation Learning: Semi-Supervised Sparse Bridging

📅 2025-10-08

📈 Citations: 0

✨ Influential: 0

career value

193K/year

🤖 AI Summary

This paper addresses predictor learning from unpaired input dataset X and output dataset Y under low supervision—i.e., with only a few labeled (X,Y) pairs—aiming for sparse, interpretable cross-domain alignment. Methodologically, it first clusters X and Y separately, then constructs a sparse inter-cluster bridging structure that explicitly models the marginal output distribution; this enables highly interpretable, cluster-level mappings using minimal paired samples. Unlike conventional semi-supervised or optimal transport approaches, the framework is model-agnostic, requires no joint X–Y representation learning, and enjoys both theoretical tractability and computational efficiency. Experiments demonstrate that it matches state-of-the-art performance in low-label regimes, exhibits strong generalization across domains, and its bridging structure facilitates intuitive causal attribution and diagnostic analysis.

Technology Category

Application Category

📝 Abstract

We introduce Bridged Clustering, a semi-supervised framework to learn predictors from any unpaired input $X$ and output $Y$ dataset. Our method first clusters $X$ and $Y$ independently, then learns a sparse, interpretable bridge between clusters using only a few paired examples. At inference, a new input $x$ is assigned to its nearest input cluster, and the centroid of the linked output cluster is returned as the prediction $hat{y}$. Unlike traditional SSL, Bridged Clustering explicitly leverages output-only data, and unlike dense transport-based methods, it maintains a sparse and interpretable alignment. Through theoretical analysis, we show that with bounded mis-clustering and mis-bridging rates, our algorithm becomes an effective and efficient predictor. Empirically, our method is competitive with SOTA methods while remaining simple, model-agnostic, and highly label-efficient in low-supervision settings.

Problem

Research questions and friction points this paper is trying to address.

Learning predictors from unpaired input and output datasets

Establishing sparse interpretable bridges between data clusters

Maintaining efficiency and interpretability in low-supervision settings

Innovation

Methods, ideas, or system contributions that make the work stand out.

Clusters input and output data independently

Learns sparse interpretable bridges between clusters

Uses cluster centroids for efficient prediction

🔎 Similar Papers

No similar papers found.