LEARNER: A Transfer Learning Method for Low-Rank Matrix Estimation

📅 2024-12-29
📈 Citations: 2
Influential: 0
📄 PDF
🤖 AI Summary
To address insufficient accuracy in low-rank matrix estimation for target populations under heterogeneous data, this paper proposes a transfer learning framework based on latent subspace alignment. Leveraging similarities between source and target populations in row- and column-wise latent subspaces, we formulate a low-rank approximation model regularized by a Procrustes distance penalty to explicitly account for subspace discrepancies. An adaptive cross-validation strategy is further designed to accommodate inter-population heterogeneity. This work is the first to explicitly embed subspace alignment into transferable low-rank estimation, circumventing strong distributional assumptions. Implemented in R, the method demonstrates substantial improvements in estimation accuracy over baseline approaches using target data only—particularly when the source signal exhibits high signal-to-noise ratio—as validated through extensive simulations and a reanalysis of genome-wide association study (GWAS) data from the Japanese Biobank.

Technology Category

Application Category

📝 Abstract
Low-rank matrix estimation is a fundamental problem in statistics and machine learning. In the context of heterogeneous data generated from diverse sources, a key challenge lies in leveraging data from a source population to enhance the estimation of a low-rank matrix in a target population of interest. One such example is estimating associations between genetic variants and diseases in non-European ancestry groups. We propose an approach that leverages similarity in the latent row and column spaces between the source and target populations to improve estimation in the target population, which we refer to as LatEnt spAce-based tRaNsfer lEaRning (LEARNER). LEARNER is based on performing a low-rank approximation of the target population data which penalizes differences between the latent row and column spaces between the source and target populations. We present a cross-validation approach that allows the method to adapt to the degree of heterogeneity across populations. We conducted extensive simulations which found that LEARNER often outperforms the benchmark approach that only uses the target population data, especially as the signal-to-noise ratio in the source population increases. We also performed an illustrative application and empirical comparison of LEARNER and benchmark approaches in a re-analysis of a genome-wide association study in the BioBank Japan cohort. LEARNER is implemented in the R package learner.
Problem

Research questions and friction points this paper is trying to address.

Transfer learning for low-rank matrix estimation across populations
Leveraging source data to enhance target matrix estimation
Addressing heterogeneity in biomedical data through latent space similarity
Innovation

Methods, ideas, or system contributions that make the work stand out.

Transfer learning using latent space similarity
Low-rank approximation with cross-population penalty
Cross-validation for adaptive heterogeneity adjustment
🔎 Similar Papers
No similar papers found.
S
Sean McGrath
Department of Biostatistics, Yale School of Public Health, Connecticut, USA
C
Cenhao Zhu
Operations Research Center, Massachusetts Institute of Technology, Massachusetts, USA
M
Min Guo
Department of Biostatistics, Harvard T.H. Chan School of Public Health, Massachusetts, USA
Rui Duan
Rui Duan
Harvard University
BiostatisticsBioinformaticsEpidemiologyElectronic Health Record