Boolean matrix logic programming for active learning of gene functions in genome-scale metabolic network models

📅 2024-05-10
🏛️ arXiv.org
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
Genome-scale metabolic models (GEMs) suffer from incomplete gene functional annotations, limiting their accuracy in predicting phenotypes under genetic perturbations and hindering synthetic biology design. To address this, we propose Boolean Matrix Logic Programming (BMLP), a novel paradigm that encodes GEMs as interpretable Datalog programs. We implement BMLP_active—a system integrating symbolic reasoning with active learning—to enable targeted experimental design and iterative model refinement. Our approach achieves, for the first time on bacterial GEMs, high-accuracy inference of pairwise genetic interactions. It reduces experimental sample requirements by over 60% compared to random sampling, substantially shrinking the experimental search space. This work advances the interpretability-driven optimization of metabolic models and accelerates the practical deployment of closed-loop microbial engineering—integrating modeling, prediction, and experimentation.

Technology Category

Application Category

📝 Abstract
Techniques to autonomously drive research have been prominent in Computational Scientific Discovery, while Synthetic Biology is a field of science that focuses on designing and constructing new biological systems for useful purposes. Here we seek to apply logic-based machine learning techniques to facilitate cellular engineering and drive biological discovery. Comprehensive databases of metabolic processes called genome-scale metabolic network models (GEMs) are often used to evaluate cellular engineering strategies to optimise target compound production. However, predicted host behaviours are not always correctly described by GEMs, often due to errors in the models. The task of learning the intricate genetic interactions within GEMs presents computational and empirical challenges. To address these, we describe a novel approach called Boolean Matrix Logic Programming (BMLP) by leveraging boolean matrices to evaluate large logic programs. We introduce a new system, $BMLP_{active}$, which efficiently explores the genomic hypothesis space by guiding informative experimentation through active learning. In contrast to sub-symbolic methods, $BMLP_{active}$ encodes a state-of-the-art GEM of a widely accepted bacterial host in an interpretable and logical representation using datalog logic programs. Notably, $BMLP_{active}$ can successfully learn the interaction between a gene pair with fewer training examples than random experimentation, overcoming the increase in experimental design space. $BMLP_{active}$ enables rapid optimisation of metabolic models to reliably engineer biological systems for producing useful compounds. It offers a realistic approach to creating a self-driving lab for microbial engineering.
Problem

Research questions and friction points this paper is trying to address.

Predicting gene interactions in metabolic networks accurately
Reducing experimental costs for learning genetic interactions
Optimizing metabolic models for reliable biological engineering
Innovation

Methods, ideas, or system contributions that make the work stand out.

Boolean matrices evaluate large logic programs
Active learning guides cost-effective experimentation
Interpretable logic programs encode metabolic models
🔎 Similar Papers
No similar papers found.
L
L. Ai
Department of Computing, Imperial College London, London, UK
Stephen Muggleton
Stephen Muggleton
Imperial College London
Inductive Logic ProgrammingAutomation of ScienceRelational LearningComputational LogicLogic Programming
S
Shi-shun Liang
Department of Life Science, Imperial College London, London, UK
G
Geoff S. Baldwin
Department of Life Science, Imperial College London, London, UK