🤖 AI Summary
Biological modeling faces challenges including uncertain prior knowledge, heterogeneous and noisy data, partial observability of systems, and complex network structures. Method: This work extends the physics-informed machine learning (PIML) paradigm to biology-informed machine learning (BIML), establishing a framework built upon four pillars—uncertainty quantification, contextualized modeling, constrained latent structure inference, and scalability—integrating PIML, foundation models, and large language models to synergize human domain expertise with computational reasoning. Contribution/Results: We provide the first systematic definition and theoretical foundation of BIML; propose a probabilistic, interpretable modeling paradigm tailored to biological system characteristics; and outline a roadmap for building a BIML ecosystem capable of handling multi-source, heterogeneous biological data. This advances the substantive transfer of PIML methodologies to core problems in life sciences.
📝 Abstract
Physics-Informed Machine Learning (PIML) has successfully integrated mechanistic understanding into machine learning, particularly in domains governed by well-known physical laws. This success has motivated efforts to apply PIML to biology, a field rich in dynamical systems but shaped by different constraints. Biological modeling, however, presents unique challenges: multi-faceted and uncertain prior knowledge, heterogeneous and noisy data, partial observability, and complex, high-dimensional networks. In this position paper, we argue that these challenges should not be seen as obstacles to PIML, but as catalysts for its evolution. We propose Biology-Informed Machine Learning (BIML): a principled extension of PIML that retains its structural grounding while adapting to the practical realities of biology. Rather than replacing PIML, BIML retools its methods to operate under softer, probabilistic forms of prior knowledge. We outline four foundational pillars as a roadmap for this transition: uncertainty quantification, contextualization, constrained latent structure inference, and scalability. Foundation Models and Large Language Models will be key enablers, bridging human expertise with computational modeling. We conclude with concrete recommendations to build the BIML ecosystem and channel PIML-inspired innovation toward challenges of high scientific and societal relevance.