🤖 AI Summary
This work addresses a key limitation of existing label-supervised manifold alignment methods, which rely on Euclidean geometry to model intra-domain relationships and often introduce semantic noise when features are weakly correlated with the downstream task, thereby degrading alignment quality. To overcome this, we propose FoSTA, a novel framework that introduces forest-induced geometry into manifold alignment for the first time. FoSTA constructs a task-relevant semantic manifold guided by labels using random forests and employs a hierarchical semantic optimal transport algorithm to achieve efficient cross-domain alignment. This approach effectively eliminates irrelevant structural noise and recovers task-critical semantic relationships. Extensive experiments on synthetic data and single-cell multi-omics tasks demonstrate that FoSTA consistently outperforms current baselines, achieving superior performance in correspondence recovery, label transfer, batch correction, and biological conservation analysis.
📝 Abstract
Label-supervised manifold alignment bridges the gap between unsupervised and correspondence-based paradigms by leveraging shared label information to align multimodal datasets. Still, most existing methods rely on Euclidean geometry to model intra-domain relationships. This approach can fail when features are only weakly related to the task of interest, leading to noisy, semantically misleading structure and degraded alignment quality. To address this limitation, we introduce FoSTA (Forest-guided Semantic Transport Alignment), a scalable alignment framework that leverages forest-induced geometry to denoise intra-domain structure and recover task-relevant manifolds prior to alignment. FoSTA builds semantic representations directly from label-informed forest affinities and aligns them via fast, hierarchical semantic transport, capturing meaningful cross-domain relationships. Extensive comparisons with established baselines demonstrate that FoSTA improves correspondence recovery and label transfer on synthetic benchmarks and delivers strong performance in practical single-cell applications, including batch correction and biological conservation.