Agentomics-ML: Autonomous Machine Learning Experimentation Agent for Genomic and Transcriptomic Data

📅 2025-06-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the poor generalization and low success rates of existing LLM-driven AutoML systems on heterogeneous, high-dimensional bioinformatics data (e.g., genomic/transcriptomic datasets), this paper introduces the first autonomous ML experimentation agent that synergistically integrates Bash filesystem interaction with large language models. The system enables end-to-end closed-loop automation—including data preprocessing, model selection, training, evaluation, and reflection-based optimization guided by performance metrics. It pioneers deep integration of LLM reasoning with low-level system operations (e.g., file I/O, command execution), enabling dynamic adaptation of data representations, model architectures, and hyperparameters. Evaluated across multiple multi-omics benchmarks, it significantly outperforms state-of-the-art autonomous agents; notably, on one task, it achieves expert-level, hand-tuned SOTA performance—substantially narrowing the performance gap between fully automated systems and human expertise.

Technology Category

Application Category

📝 Abstract
The adoption of machine learning (ML) and deep learning methods has revolutionized molecular medicine by driving breakthroughs in genomics, transcriptomics, drug discovery, and biological systems modeling. The increasing quantity, multimodality, and heterogeneity of biological datasets demand automated methods that can produce generalizable predictive models. Recent developments in large language model-based agents have shown promise for automating end-to-end ML experimentation on structured benchmarks. However, when applied to heterogeneous computational biology datasets, these methods struggle with generalization and success rates. Here, we introduce Agentomics-ML, a fully autonomous agent-based system designed to produce a classification model and the necessary files for reproducible training and inference. Our method follows predefined steps of an ML experimentation process, repeatedly interacting with the file system through Bash to complete individual steps. Once an ML model is produced, training and validation metrics provide scalar feedback to a reflection step to identify issues such as overfitting. This step then creates verbal feedback for future iterations, suggesting adjustments to steps such as data representation, model architecture, and hyperparameter choices. We have evaluated Agentomics-ML on several established genomic and transcriptomic benchmark datasets and show that it outperforms existing state-of-the-art agent-based methods in both generalization and success rates. While state-of-the-art models built by domain experts still lead in absolute performance on the majority of the computational biology datasets used in this work, Agentomics-ML narrows the gap for fully autonomous systems and achieves state-of-the-art performance on one of the used benchmark datasets. The code is available at https://github.com/BioGeMT/Agentomics-ML.
Problem

Research questions and friction points this paper is trying to address.

Automates ML experimentation for genomic and transcriptomic data
Addresses generalization challenges in computational biology datasets
Improves success rates of autonomous agent-based ML systems
Innovation

Methods, ideas, or system contributions that make the work stand out.

Autonomous agent for ML experimentation
Bash interaction for file system tasks
Reflection step with verbal feedback
🔎 Similar Papers
No similar papers found.
V
Vlastimil Martinek
Centre for Molecular Medicine and Biobanking, University of Malta, Msida, Malta; Department of Applied Biomedical Science, Faculty of Health Sciences, University of Malta, Msida, Malta
A
Andrea Gariboldi
Centre for Molecular Medicine and Biobanking, University of Malta, Msida, Malta; Department of Applied Biomedical Science, Faculty of Health Sciences, University of Malta, Msida, Malta
D
Dimosthenis Tzimotoudis
Centre for Molecular Medicine and Biobanking, University of Malta, Msida, Malta; Department of Applied Biomedical Science, Faculty of Health Sciences, University of Malta, Msida, Malta
A
Aitor Alberdi Escudero
Centre for Molecular Medicine and Biobanking, University of Malta, Msida, Malta; Department of Applied Biomedical Science, Faculty of Health Sciences, University of Malta, Msida, Malta
E
Edward Blake
Centre for Molecular Medicine and Biobanking, University of Malta, Msida, Malta; Department of Applied Biomedical Science, Faculty of Health Sciences, University of Malta, Msida, Malta
D
David Cechak
Central European Institute of Technology, Masaryk University, Brno, Czech Republic; National Centre for Biomolecular Research, Faculty of Science, Masaryk University, Brno, Czech Republic
L
Luke Cassar
Centre for Molecular Medicine and Biobanking, University of Malta, Msida, Malta; Department of Applied Biomedical Science, Faculty of Health Sciences, University of Malta, Msida, Malta
A
Alessandro Balestrucci
Centre for Molecular Medicine and Biobanking, University of Malta, Msida, Malta; Department of Applied Biomedical Science, Faculty of Health Sciences, University of Malta, Msida, Malta
Panagiotis Alexiou
Panagiotis Alexiou
ERA Chair, University of Malta
Bioinformatics