🤖 AI Summary
Traditional manifold alignment (MA) methods lack out-of-sample generalization capability, limiting their applicability in real-world cross-domain scenarios. To address this, we propose a geometrically regularized dual-branch autoencoder framework that jointly optimizes embedding conformality, cross-modal structural consistency, and reconstruction fidelity via pre-trained alignment guidance and multi-task learning. Crucially, we embed manifold geometric constraints directly into the autoencoder’s latent space, enabling zero-shot out-of-sample extension without retraining. Extensive experiments on multiple benchmark datasets demonstrate significant improvements in cross-domain embedding consistency, information preservation, and transfer performance. In a multimodal Alzheimer’s disease diagnosis task—integrating PET, MRI, and clinical data—our method achieves a 4.2% absolute gain in prediction accuracy, validating its strong generalizability and clinical utility.
📝 Abstract
Manifold alignment (MA) involves a set of techniques for learning shared representations across domains, yet many traditional MA methods are incapable of performing out-of-sample extension, limiting their real-world applicability. We propose a guided representation learning framework leveraging a geometry-regularized twin autoencoder (AE) architecture to enhance MA while enabling generalization to unseen data. Our method enforces structured cross-modal mappings to maintain geometric fidelity in learned embeddings. By incorporating a pre-trained alignment model and a multitask learning formulation, we improve cross-domain generalization and representation robustness while maintaining alignment fidelity. We evaluate our approach using several MA methods, showing improvements in embedding consistency, information preservation, and cross-domain transfer. Additionally, we apply our framework to Alzheimer's disease diagnosis, demonstrating its ability to integrate multi-modal patient data and enhance predictive accuracy in cases limited to a single domain by leveraging insights from the multi-modal problem.