Multiple Linked Tensor Factorization

📅 2025-02-27
📈 Citations: 0
Influential: 0
📄 PDF

career value

202K/year
🤖 AI Summary
High-throughput biomedical data—such as multi-omics datasets—are inherently multi-source, high-dimensional, and often contain missing values, posing significant challenges for simultaneous dimensionality reduction and disentanglement of shared versus source-specific latent structures. To address this, we propose MULTIFAC, the first method extending ℓ₂-regularized CP decomposition to joint modeling of multiple related tensors. MULTIFAC simultaneously decomposes multiple tensors within a unified framework, automatically identifying both shared latent factors across sources and source-specific factors. It integrates an expectation-maximization (EM) algorithm to enable robust imputation of missing entries. Evaluated on synthetic benchmarks and real-world multi-omics data from early-stage iron deficiency, MULTIFAC significantly improves signal reconstruction accuracy, precision in detecting shared structures, and imputation quality—while enhancing biological interpretability through structured, interpretable factorizations.

Technology Category

Application Category

📝 Abstract
In biomedical research and other fields, it is now common to generate high content data that are both multi-source and multi-way. Multi-source data are collected from different high-throughput technologies while multi-way data are collected over multiple dimensions, yielding multiple tensor arrays. Integrative analysis of these data sets is needed, e.g., to capture and synthesize different facets of complex biological systems. However, despite growing interest in multi-source and multi-way factorization techniques, methods that can handle data that are both multi-source and multi-way are limited. In this work, we propose a Multiple Linked Tensors Factorization (MULTIFAC) method extending the CANDECOMP/PARAFAC (CP) decomposition to simultaneously reduce the dimension of multiple multi-way arrays and approximate underlying signal. We first introduce a version of the CP factorization with L2 penalties on the latent factors, leading to rank sparsity. When extended to multiple linked tensors, the method automatically reveals latent components that are shared across data sources or individual to each data source. We also extend the decomposition algorithm to its expectation-maximization (EM) version to handle incomplete data with imputation. Extensive simulation studies are conducted to demonstrate MULTIFAC's ability to (i) approximate underlying signal, (ii) identify shared and unshared structures, and (iii) impute missing data. The approach yields an interpretable decomposition on multi-way multi-omics data for a study on early-life iron deficiency.
Problem

Research questions and friction points this paper is trying to address.

Integrating multi-source multi-way data
Extending CP decomposition for dimensionality reduction
Handling incomplete data with EM algorithm
Innovation

Methods, ideas, or system contributions that make the work stand out.

Extends CP decomposition for multi-way data
Introduces L2 penalties for rank sparsity
Uses EM algorithm for missing data imputation
💼 Related Jobs
Z
Zhiyu Kang
Division of Biostatistics and Health Data Science, School of Public Health, University of Minnesota, Minneapolis, 55455, MN, USA.
R
Raghavendra B. Rao
Division of Neonatology, Department of Pediatrics, University of Minnesota, Minneapolis, 55455, MN, USA.
Eric F. Lock
Eric F. Lock
Associate Professor, Biostatistics, University of Minnesota
Statisticsmachine learninghigh-dimensional data