Chain-linked multiple matrix integration via embedding alignment

📅 2024-12-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses the pervasive block-wise missing matrix completion problem in multi-source data integration—where only a few noisy, partially overlapping submatrices are observed, and the goal is to reconstruct the full matrix with high accuracy. We propose the first chain-aligned framework, which innovatively integrates entity embedding alignment, cascaded aggregation, graph-based alignment optimization, and weighted ensemble matrix factorization to enable robust cross-submatrix information fusion even under low overlap (<5%). We theoretically establish bounds on estimation error and prove asymptotic normality of the estimator. Experiments on synthetic and real-world datasets demonstrate that our method significantly outperforms conventional matrix completion algorithms in recovery accuracy, while maintaining linear time complexity and supporting scalable, noise-robust modeling.

Technology Category

Application Category

📝 Abstract
Motivated by the increasing demand for multi-source data integration in various scientific fields, in this paper we study matrix completion in scenarios where the data exhibits certain block-wise missing structures -- specifically, where only a few noisy submatrices representing (overlapping) parts of the full matrix are available. We propose the Chain-linked Multiple Matrix Integration (CMMI) procedure to efficiently combine the information that can be extracted from these individual noisy submatrices. CMMI begins by deriving entity embeddings for each observed submatrix, then aligns these embeddings using overlapping entities between pairs of submatrices, and finally aggregates them to reconstruct the entire matrix of interest. We establish, under mild regularity conditions, entrywise error bounds and normal approximations for the CMMI estimates. Simulation studies and real data applications show that CMMI is computationally efficient and effective in recovering the full matrix, even when overlaps between the observed submatrices are minimal.
Problem

Research questions and friction points this paper is trying to address.

Integrating multi-source data with block-wise missing structures
Aligning embeddings from noisy overlapping submatrices for reconstruction
Estimating full matrix accurately with minimal submatrix overlaps
Innovation

Methods, ideas, or system contributions that make the work stand out.

Chain-linked matrix integration via embedding alignment
Aligns embeddings using overlapping entities
Aggregates embeddings to reconstruct full matrix
🔎 Similar Papers
No similar papers found.
R
Runbing Zheng
Department of Applied Mathematics and Statistics, Johns Hopkins University
Minh Tang
Minh Tang
North Carolina State University
graph inferencedimension reduction