Matrix Factorization for Inferring Associations and Missing Links

📅 2025-03-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the link prediction problem in critical domains including knowledge graphs, biological networks, and nuclear proliferation monitoring. It tackles the overfitting/underfitting issue in conventional nonnegative matrix factorization (NMF) caused by manual rank selection. We propose three NMF variants—WNMFk, BNMFk, and RNMFk—along with an ensemble logical decomposition framework. Crucially, we introduce the first automatic rank selection mechanism integrating stability assessment, uncertainty quantification (UQ), and an enhanced bootstrap procedure. Our method combines Otsu thresholding, k-means clustering, coordinate descent optimization, and confidence-driven rejection inference. Evaluated on three synthetic datasets and five real-world protein–protein interaction networks, it significantly outperforms LMF and symLMF in both prediction accuracy and robustness. Empirical results confirm that UQ effectively supports high-reliability predictions, enabling trustworthy decision-making in safety-critical applications.

Technology Category

Application Category

📝 Abstract
Missing link prediction is a method for network analysis, with applications in recommender systems, biology, social sciences, cybersecurity, information retrieval, and Artificial Intelligence (AI) reasoning in Knowledge Graphs. Missing link prediction identifies unseen but potentially existing connections in a network by analyzing the observed patterns and relationships. In proliferation detection, this supports efforts to identify and characterize attempts by state and non-state actors to acquire nuclear weapons or associated technology - a notoriously challenging but vital mission for global security. Dimensionality reduction techniques like Non-Negative Matrix Factorization (NMF) and Logistic Matrix Factorization (LMF) are effective but require selection of the matrix rank parameter, that is, of the number of hidden features, k, to avoid over/under-fitting. We introduce novel Weighted (WNMFk), Boolean (BNMFk), and Recommender (RNMFk) matrix factorization methods, along with ensemble variants incorporating logistic factorization, for link prediction. Our methods integrate automatic model determination for rank estimation by evaluating stability and accuracy using a modified bootstrap methodology and uncertainty quantification (UQ), assessing prediction reliability under random perturbations. We incorporate Otsu threshold selection and k-means clustering for Boolean matrix factorization, comparing them to coordinate descent-based Boolean thresholding. Our experiments highlight the impact of rank k selection, evaluate model performance under varying test-set sizes, and demonstrate the benefits of UQ for reliable predictions using abstention. We validate our methods on three synthetic datasets (Boolean and uniformly distributed) and benchmark them against LMF and symmetric LMF (symLMF) on five real-world protein-protein interaction networks, showcasing an improved prediction performance.
Problem

Research questions and friction points this paper is trying to address.

Predicts missing links in networks using matrix factorization.
Addresses over/under-fitting in dimensionality reduction techniques.
Improves reliability of predictions with uncertainty quantification.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces Weighted, Boolean, Recommender matrix factorization methods.
Integrates automatic rank estimation using bootstrap methodology.
Uses Otsu thresholding and k-means for Boolean factorization.
🔎 Similar Papers
No similar papers found.
R
Ryan Barron
Theoretical Division, Los Alamos National Laboratory, USA
M
M. Eren
Information Systems and Modeling, Los Alamos National Laboratory, USA
D
D. Truong
Theoretical Division, Los Alamos National Laboratory, USA
Cynthia Matuszek
Cynthia Matuszek
Associate Professor, UMBC
roboticsnatural language groundingmachine learningknowledge representation
J
James Wendelberger
Computer, Computational, and Statistical Sciences, Los Alamos National Laboratory, USA
M
Mary F. Dorn
Computer, Computational, and Statistical Sciences, Los Alamos National Laboratory, USA
B
Boian Alexandrov
Theoretical Division, Los Alamos National Laboratory, USA