Topic Modeling and Link-Prediction for Material Property Discovery

๐Ÿ“… 2025-07-08
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Large-scale, sparse, and noisy scientific literature networks often suffer from missing materialโ€“topic associations. To address this, we propose an AI-driven hierarchical link prediction framework that integrates hierarchical non-negative matrix factorization (HNMFk) with Boolean matrix factorization (BNMFk), augmented by logistic matrix factorization, to construct a three-layer interpretable topic tree. The framework automatically uncovers latent associations of transition metal dichalcogenides in domains such as superconductivity and energy storage. Leveraging probabilistic scoring and model self-selection, it achieves high-precision, traceable link prediction on a sparse knowledge graph comprising 46,000 publications. Experiments successfully recover masked superconducting material links and generate multiple interdisciplinary, empirically testable hypotheses. An accompanying interactive dashboard enables human-in-the-loop discovery.

Technology Category

Application Category

๐Ÿ“ Abstract
Link prediction infers missing or future relations between graph nodes, based on connection patterns. Scientific literature networks and knowledge graphs are typically large, sparse, and noisy, and often contain missing links between entities. We present an AI-driven hierarchical link prediction framework that integrates matrix factorization to infer hidden associations and steer discovery in complex material domains. Our method combines Hierarchical Nonnegative Matrix Factorization (HNMFk) and Boolean matrix factorization (BNMFk) with automatic model selection, as well as Logistic matrix factorization (LMF), we use to construct a three-level topic tree from a 46,862-document corpus focused on 73 transition-metal dichalcogenides (TMDs). These materials are studied in a variety of physics fields with many current and potential applications. An ensemble BNMFk + LMF approach fuses discrete interpretability with probabilistic scoring. The resulting HNMFk clusters map each material onto coherent topics like superconductivity, energy storage, and tribology. Also, missing or weakly connected links are highlight between topics and materials, suggesting novel hypotheses for cross-disciplinary exploration. We validate our method by removing publications about superconductivity in well-known superconductors, and show the model predicts associations with the superconducting TMD clusters. This shows the method finds hidden connections in a graph of material to latent topic associations built from scientific literature, especially useful when examining a diverse corpus of scientific documents covering the same class of phenomena or materials but originating from distinct communities and perspectives. The inferred links generating new hypotheses, produced by our method, are exposed through an interactive Streamlit dashboard, designed for human-in-the-loop scientific discovery.
Problem

Research questions and friction points this paper is trying to address.

Predict missing links in large, sparse material science literature networks
Discover hidden associations in transition-metal dichalcogenides (TMDs) research
Enable cross-disciplinary hypothesis generation via topic-material link prediction
Innovation

Methods, ideas, or system contributions that make the work stand out.

AI-driven hierarchical link prediction framework
Combines HNMFk, BNMFk, and LMF techniques
Interactive Streamlit dashboard for discovery
๐Ÿ”Ž Similar Papers
No similar papers found.