Missing-Data-Induced Phase Transitions in Spectral PLS for Multimodal Learning

📅 2026-01-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of extracting shared latent structures from two-view multimodal data when entries are missing completely at random. Under a high-dimensional proportional spiked model, the masked cross-covariance matrix is normalized into a signal-attenuated spiked rectangular random matrix, revealing for the first time a missingness-induced Baik–Ben Arous–Péché (BBP)-type phase transition. The authors derive analytically the critical signal-to-noise ratio threshold and a closed-form expression for the asymptotic alignment between the leading singular vectors and the underlying shared directions. Leveraging tools from high-dimensional random matrix theory and spectral partial least squares (PLS) analysis, the theoretical predictions exhibit excellent agreement with simulations and semi-synthetic multimodal experiments across varying aspect ratios, signal strengths, and missing rates, thereby validating both the phase transition boundary and the recovery performance.

Technology Category

Application Category

📝 Abstract
Partial Least Squares (PLS) learns shared structure from paired data via the top singular vectors of the empirical cross-covariance (PLS-SVD), but multimodal datasets often have missing entries in both views. We study PLS-SVD under independent entry-wise missing-completely-at-random masking in a proportional high-dimensional spiked model. After appropriate normalization, the masked cross-covariance behaves like a spiked rectangular random matrix whose effective signal strength is attenuated by $\sqrt{\rho}$, where $\rho$ is the joint entry retention probability. As a result, PLS-SVD exhibits a sharp BBP-type phase transition: below a critical signal-to-noise threshold the leading singular vectors are asymptotically uninformative, while above it they achieve nontrivial alignment with the latent shared directions, with closed-form asymptotic overlap formulas. Simulations and semi-synthetic multimodal experiments corroborate the predicted phase diagram and recovery curves across aspect ratios, signal strengths, and missingness levels.
Problem

Research questions and friction points this paper is trying to address.

missing data
phase transition
Partial Least Squares
multimodal learning
high-dimensional statistics
Innovation

Methods, ideas, or system contributions that make the work stand out.

phase transition
spectral PLS
missing data
spiked random matrix
multimodal learning
🔎 Similar Papers
No similar papers found.
Anders Gjølbye
Anders Gjølbye
Technical University of Denmark
ExplainabilityDeep LearningEEG
I
Ida Kargaard
Technical University of Denmark, Section for Cognitive Systems, 2800 Kgs. Lyngby, Denmark
E
Emma Kargaard
Technical University of Denmark, Section for Cognitive Systems, 2800 Kgs. Lyngby, Denmark
Lars Kai Hansen
Lars Kai Hansen
Professor, Cognitive Systems, DTU Compute, Technical University of Denmark
Machine learningAIneuroimagingcognitive systemssignal processing