BayVel: A Bayesian Framework for RNA Velocity Estimation in Single-Cell Transcriptomics

📅 2025-05-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing RNA velocity methods (e.g., scVelo) rely heavily on heuristic preprocessing, suffer from parameter unidentifiability, and lack principled uncertainty quantification—undermining their statistical reliability. To address these limitations, we propose the first Bayesian hierarchical model that directly operates on raw UMI counts, enabling end-to-end probabilistic modeling within a fully Bayesian inference framework. Our approach obviates the need for normalization or gene filtering, inherently ensures parameter identifiability, and provides comprehensive posterior uncertainty estimates. We perform rigorous posterior inference via Markov chain Monte Carlo (MCMC). On synthetic data, our method accurately recovers ground-truth kinetic parameters. Applied to real pancreatic epithelial scRNA-seq data, it yields biologically more robust conclusions that diverge from those of scVelo—demonstrating both statistical rigor and biological interpretability. This work establishes a foundation for statistically sound, interpretable RNA velocity analysis.

Technology Category

Application Category

📝 Abstract
RNA velocity is a model of gene expression dynamics designed to analyze single-cell RNA sequencing (scRNA-seq) data, and it has recently gained significant attention. However, despite its popularity, the model has raised several concerns, primarily related to three issues: its heavy dependence on data preprocessing, the need for post-processing of the results, and the limitations of the underlying statistical methodology. Current approaches, such as scVelo, suffer from notable statistical shortcomings. These include identifiability problems, reliance on heuristic preprocessing steps, and the absence of uncertainty quantification. To address these limitations, we propose BayVel, a Bayesian hierarchical model that directly models raw count data. BayVel resolves identifiability issues and provides posterior distributions for all parameters, including the RNA velocities themselves, without the need for any post processing. We evaluate BayVel's performance using simulated datasets. While scVelo fails to accurately reconstruct parameters, even when data are simulated directly from the model assumptions, BayVel demonstrates strong accuracy and robustness. This highlights BayVel as a statistically rigorous and reliable framework for studying transcriptional dynamics in the context of RNA velocity modeling. When applied to a real dataset of pancreatic epithelial cells previously analyzed with scVelo, BayVel does not replicate their findings, which appears to be strongly influenced by the postprocessing, supporting concerns raised in other studies about the reliability of scVelo.
Problem

Research questions and friction points this paper is trying to address.

Addresses RNA velocity model's data preprocessing dependence
Eliminates need for post-processing and uncertainty quantification
Resolves identifiability issues in current RNA velocity methods
Innovation

Methods, ideas, or system contributions that make the work stand out.

Bayesian hierarchical model for raw count data
Resolves identifiability issues with posterior distributions
No post-processing needed for RNA velocity estimation
🔎 Similar Papers
No similar papers found.
E
Elena Sabbioni
Politecnico di Torino, Dpt. of Mathematical Science
Enrico Bibbona
Enrico Bibbona
Politecnico di Torino
stochastic models and related statistical inference
G
G. Mastrantonio
Politecnico di Torino, Dpt. of Mathematical Science
Guido Sanguinetti
Guido Sanguinetti
Reader in Informatics, University of Edinburgh
Machine LearningSystems BiologyStatistical modelling