OmniGenBench: A Modular Platform for Reproducible Genomic Foundation Models Benchmarking

📅 2025-05-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Genomic foundation models (GFMs) are advancing rapidly, yet face critical reproducibility bottlenecks—including opaque training data, non-interoperable model architectures, fragmented evaluation protocols, and insufficient interpretability. To address these challenges, we introduce the first modular benchmarking platform specifically designed for GFMs, unifying four core layers: data, models, evaluation, and interpretability. Our platform features standardized interface protocols, an automated evaluation pipeline, an open model integration framework, and an embedded interpretability toolkit. It enables one-click, end-to-end reproducible assessment across five major benchmark suites and has seamlessly integrated over 31 open-source GFMs. This work systematically resolves fundamental reproducibility barriers in genomic AI, significantly enhancing research credibility and cross-institutional collaboration efficiency.

Technology Category

Application Category

📝 Abstract
The code of nature, embedded in DNA and RNA genomes since the origin of life, holds immense potential to impact both humans and ecosystems through genome modeling. Genomic Foundation Models (GFMs) have emerged as a transformative approach to decoding the genome. As GFMs scale up and reshape the landscape of AI-driven genomics, the field faces an urgent need for rigorous and reproducible evaluation. We present OmniGenBench, a modular benchmarking platform designed to unify the data, model, benchmarking, and interpretability layers across GFMs. OmniGenBench enables standardized, one-command evaluation of any GFM across five benchmark suites, with seamless integration of over 31 open-source models. Through automated pipelines and community-extensible features, the platform addresses critical reproducibility challenges, including data transparency, model interoperability, benchmark fragmentation, and black-box interpretability. OmniGenBench aims to serve as foundational infrastructure for reproducible genomic AI research, accelerating trustworthy discovery and collaborative innovation in the era of genome-scale modeling.
Problem

Research questions and friction points this paper is trying to address.

Standardize evaluation of Genomic Foundation Models (GFMs)
Address reproducibility challenges in genomic AI research
Unify data, model, and benchmarking for GFMs
Innovation

Methods, ideas, or system contributions that make the work stand out.

Modular platform for genomic model benchmarking
Standardized one-command evaluation across benchmarks
Automated pipelines for reproducibility and interoperability
🔎 Similar Papers
No similar papers found.
H
Heng Yang
Department of Computer Science, University of Exeter, Exeter, UK
J
Jack Cole
Department of Computer Science, University of Exeter, Exeter, UK
Y
Yuan Li
National University of Defense Technology, Changsha, China
Renzhi Chen
Renzhi Chen
Qiyuan Lab
Multi-objective OptimizationEvolutionary Algorithm
Geyong Min
Geyong Min
University of Exeter
K
Ke Li
Department of Computer Science, University of Exeter, Exeter, UK