🤖 AI Summary
This paper investigates the fundamental limits and algorithmic design of joint inference in high-dimensional multimodal learning: given two noisy, correlated spiked data matrices, how can shared latent variables be optimally recovered? First, it rigorously characterizes the Bayesian optimal recovery threshold under generalized priors and heterogeneous noise. Second, it proves that classical methods—Partial Least Squares (PLS) and Canonical Correlation Analysis (CCA)—exhibit suboptimal phase transitions and fail to achieve this theoretical limit. Third, it proposes a joint estimation algorithm based on Approximate Message Passing (AMP), with rigorous performance characterization via state evolution analysis and numerical validation. The AMP algorithm achieves Bayesian-optimal recovery with linear time complexity, significantly outperforming PLS and CCA. Collectively, this work establishes precise statistical boundaries for multimodal fusion and provides a principled algorithmic pathway to attain them.
📝 Abstract
This work explores multi-modal inference in a high-dimensional simplified model, analytically quantifying the performance gain of multi-modal inference over that of analyzing modalities in isolation. We present the Bayes-optimal performance and recovery thresholds in a model where the objective is to recover the latent structures from two noisy data matrices with correlated spikes. The paper derives the approximate message passing (AMP) algorithm for this model and characterizes its performance in the high-dimensional limit via the associated state evolution. The analysis holds for a broad range of priors and noise channels, which can differ across modalities. The linearization of AMP is compared numerically to the widely used partial least squares (PLS) and canonical correlation analysis methods, which are both observed to suffer from a sub-optimal recovery threshold.