Data-driven Discovery of Digital Twins in Biomedical Research

📅 2025-08-29

📈 Citations: 0

✨ Influential: 0

career value

231K/year

🤖 AI Summary

Constructing digital twins for biological time-series modeling remains highly manual and struggles with noise, high dimensionality, and latent variables. Method: We propose a hybrid modular framework integrating chemical reaction network priors, Bayesian uncertainty quantification, and deep learning’s knowledge integration capabilities. Our approach systematically unifies sparse regression—particularly under the Bayesian paradigm—symbolic regression, deep learning, and large language models to synergistically fuse data-driven modeling with domain expertise. Contributions/Results: (1) We empirically validate the superior interpretability and robustness of Bayesian sparse regression for biological system identification; (2) we characterize both the promise and reliability bottlenecks of deep learning in knowledge-guided twin construction; (3) we introduce the first unified benchmark suite specifically designed for evaluating biological digital twins, establishing a new paradigm for automated, trustworthy twin modeling. This work bridges mechanistic understanding and data-driven scalability, advancing reproducible, interpretable, and uncertainty-aware modeling of complex biological dynamics.

Technology Category

Application Category

📝 Abstract

Recent technological advances have expanded the availability of high-throughput biological datasets, enabling the reliable design of digital twins of biomedical systems or patients. Such computational tools represent key reaction networks driving perturbation or drug response and can guide drug discovery and personalized therapeutics. Yet, their development still relies on laborious data integration by the human modeler, so that automated approaches are critically needed. The success of data-driven system discovery in Physics, rooted in clean datasets and well-defined governing laws, has fueled interest in applying similar techniques in Biology, which presents unique challenges. Here, we reviewed methodologies for automatically inferring digital twins from biological time series, which mostly involve symbolic or sparse regression. We evaluate algorithms according to eight biological and methodological challenges, associated to noisy/incomplete data, multiple conditions, prior knowledge integration, latent variables, high dimensionality, unobserved variable derivatives, candidate library design, and uncertainty quantification. Upon these criteria, sparse regression generally outperformed symbolic regression, particularly when using Bayesian frameworks. We further highlight the emerging role of deep learning and large language models, which enable innovative prior knowledge integration, though the reliability and consistency of such approaches must be improved. While no single method addresses all challenges, we argue that progress in learning digital twins will come from hybrid and modular frameworks combining chemical reaction network-based mechanistic grounding, Bayesian uncertainty quantification, and the generative and knowledge integration capacities of deep learning. To support their development, we further propose a benchmarking framework to evaluate methods across all challenges.

Problem

Research questions and friction points this paper is trying to address.

Automating digital twin inference from biological time series data

Addressing challenges like noisy data and latent variables

Evaluating methods for reliability and uncertainty quantification

Innovation

Methods, ideas, or system contributions that make the work stand out.

Automated inference from biological time series data

Sparse regression with Bayesian uncertainty quantification

Hybrid frameworks combining mechanistic and deep learning

🔎 Similar Papers

No similar papers found.