Multiple Linked Tensor Factorization

📅 2025-02-27

📈 Citations: 0

✨ Influential: 0

career value

214K/year

🤖 AI Summary

High-throughput biomedical data—such as multi-omics datasets—are inherently multi-source, high-dimensional, and often contain missing values, posing significant challenges for simultaneous dimensionality reduction and disentanglement of shared versus source-specific latent structures. To address this, we propose MULTIFAC, the first method extending ℓ₂-regularized CP decomposition to joint modeling of multiple related tensors. MULTIFAC simultaneously decomposes multiple tensors within a unified framework, automatically identifying both shared latent factors across sources and source-specific factors. It integrates an expectation-maximization (EM) algorithm to enable robust imputation of missing entries. Evaluated on synthetic benchmarks and real-world multi-omics data from early-stage iron deficiency, MULTIFAC significantly improves signal reconstruction accuracy, precision in detecting shared structures, and imputation quality—while enhancing biological interpretability through structured, interpretable factorizations.

Technology Category

Application Category

📝 Abstract

In biomedical research and other fields, it is now common to generate high content data that are both multi-source and multi-way. Multi-source data are collected from different high-throughput technologies while multi-way data are collected over multiple dimensions, yielding multiple tensor arrays. Integrative analysis of these data sets is needed, e.g., to capture and synthesize different facets of complex biological systems. However, despite growing interest in multi-source and multi-way factorization techniques, methods that can handle data that are both multi-source and multi-way are limited. In this work, we propose a Multiple Linked Tensors Factorization (MULTIFAC) method extending the CANDECOMP/PARAFAC (CP) decomposition to simultaneously reduce the dimension of multiple multi-way arrays and approximate underlying signal. We first introduce a version of the CP factorization with L2 penalties on the latent factors, leading to rank sparsity. When extended to multiple linked tensors, the method automatically reveals latent components that are shared across data sources or individual to each data source. We also extend the decomposition algorithm to its expectation-maximization (EM) version to handle incomplete data with imputation. Extensive simulation studies are conducted to demonstrate MULTIFAC's ability to (i) approximate underlying signal, (ii) identify shared and unshared structures, and (iii) impute missing data. The approach yields an interpretable decomposition on multi-way multi-omics data for a study on early-life iron deficiency.

Problem

Research questions and friction points this paper is trying to address.

Integrating multi-source multi-way data

Extending CP decomposition for dimensionality reduction

Handling incomplete data with EM algorithm

Innovation

Methods, ideas, or system contributions that make the work stand out.

Extends CP decomposition for multi-way data

Introduces L2 penalties for rank sparsity

Uses EM algorithm for missing data imputation

🔎 Similar Papers

A fast Multiplicative Updates algorithm for Non-negative Matrix Factorization