🤖 AI Summary
This work investigates the Bayesian optimal recovery limit for high-rank structured matrices—such as those admitting a bilinear product form—in high-dimensional matrix sensing. Focusing on the asymptotic regime where the number of measurements scales linearly with matrix dimensions, we establish the first universal theory for Bayesian optimality in generalized linear models under structured priors. We rigorously prove the Gaussian equivalence property of the sensing matrix and unify statistical-physics-inspired approximate message passing (AMP) and quadratic neural network predictions. Leveraging high-dimensional asymptotics, random matrix theory, and matrix denoising theory, we derive a closed-form expression for the asymptotic mean-squared error (MSE) and rigorously verify key physical conjectures from ETB+24 and MTM+24. Our results provide the first rigorous theoretical benchmark for wide neural networks and token-level sequence modeling.
📝 Abstract
In the matrix sensing problem, one wishes to reconstruct a matrix from (possibly noisy) observations of its linear projections along given directions. We consider this model in the high-dimensional limit: while previous works on this model primarily focused on the recovery of low-rank matrices, we consider in this work more general classes of structured signal matrices with potentially large rank, e.g. a product of two matrices of sizes proportional to the dimension. We provide rigorous asymptotic equations characterizing the Bayes-optimal learning performance from a number of samples which is proportional to the number of entries in the matrix. Our proof is composed of three key ingredients: $(i)$ we prove universality properties to handle structured sensing matrices, related to the ''Gaussian equivalence'' phenomenon in statistical learning, $(ii)$ we provide a sharp characterization of Bayes-optimal learning in generalized linear models with Gaussian data and structured matrix priors, generalizing previously studied settings, and $(iii)$ we leverage previous works on the problem of matrix denoising. The generality of our results allow for a variety of applications: notably, we mathematically establish predictions obtained via non-rigorous methods from statistical physics in [ETB+24] regarding Bilinear Sequence Regression, a benchmark model for learning from sequences of tokens, and in [MTM+24] on Bayes-optimal learning in neural networks with quadratic activation function, and width proportional to the dimension.