Bayesian Non-Negative Matrix Factorization with Correlated Mutation Type Probabilities for Mutational Signatures

📅 2025-06-18
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Conventional mutational signature analysis methods—such as standard and Bayesian non-negative matrix factorization (NMF)—assume independence among mutation types, contradicting biological evidence and limiting resolution in low-sample settings. Method: We propose the first hierarchical Bayesian NMF model incorporating a multivariate truncated normal prior to explicitly encode covariance structure among mutation types in the signature matrix, enabling data-driven discovery of dependencies. The method integrates COSMIC prior knowledge and employs MCMC inference, substantially improving convergence speed and robustness in small-sample regimes. Contribution/Results: Evaluated on single-base substitution (SBS) spectra, our approach achieves significantly higher signature identification accuracy than state-of-the-art methods. The implementation is publicly available as an integrated module in an open-source R package. Furthermore, the framework is generalizable to diverse NMF applications beyond mutational signature analysis.

Technology Category

Application Category

📝 Abstract
Somatic mutations, or alterations in DNA of a somatic cell, are key markers of cancer. In recent years, mutational signature analysis has become a prominent field of study within cancer research, commonly with Nonnegative Matrix Factorization (NMF) and Bayesian NMF. However, current methods assume independence across mutation types in the signatures matrix. This paper expands upon current Bayesian NMF methodologies by proposing novel methods that account for the dependencies between the mutation types. First, we implement the Bayesian NMF specification with a Multivariate Truncated Normal prior on the signatures matrix in order to model the covariance structure using external information, in our case estimated from the COSMIC signatures database. This model converges in fewer iterations, using MCMC, when compared to a model with independent Truncated Normal priors on elements of the signatures matrix and results in improvements in accuracy, especially on small sample sizes. In addition, we develop a hierarchical model that allows the covariance structure of the signatures matrix to be discovered rather than specified upfront, giving the algorithm more flexibility. This flexibility for the algorithm to learn the dependence structure of the signatures allows a better understanding of biological interactions and how these change across different types of cancer. The code for this project is contributed to an open-source R software package. Our work lays the groundwork for future research to incorporate dependency structure across mutation types in the signatures matrix and is also applicable to any use of NMF beyond just single-base substitution (SBS) mutational signatures.
Problem

Research questions and friction points this paper is trying to address.

Model dependencies between mutation types in Bayesian NMF.
Improve accuracy of mutational signature analysis with covariance modeling.
Enable flexible learning of mutation type interactions across cancers.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Bayesian NMF with Multivariate Truncated Normal prior
Hierarchical model for flexible covariance structure
Open-source R package for mutational signature analysis