seqme: a Python library for evaluating biological sequence design

📅 2025-11-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current biomolecular sequence design methods lack unified, reproducible evaluation standards, hindering fair and rigorous performance comparison. To address this, we introduce BioSeqEval—a modular, open-source Python evaluation library that systematically integrates three model-agnostic metric categories: sequence-based, embedding-based, and property-based—representing the first such comprehensive framework. It supports one-shot and iterative design evaluation across diverse sequence modalities, including small molecules, DNA, RNA, peptides, and proteins. The library incorporates state-of-the-art pretrained embedding models, machine learning–based property predictors, efficient sequence alignment tools, and interactive visualization modules for diagnostic analysis. Empirical evaluation demonstrates that BioSeqEval significantly enhances evaluation standardization, cross-method comparability, and methodological transparency. It exhibits strong flexibility and robustness across multiple benchmark design tasks, enabling reproducible, interpretable, and scalable assessment of generative sequence models.

Technology Category

Application Category

📝 Abstract
Recent advances in computational methods for designing biological sequences have sparked the development of metrics to evaluate these methods performance in terms of the fidelity of the designed sequences to a target distribution and their attainment of desired properties. However, a single software library implementing these metrics was lacking. In this work we introduce seqme, a modular and highly extendable open-source Python library, containing model-agnostic metrics for evaluating computational methods for biological sequence design. seqme considers three groups of metrics: sequence-based, embedding-based, and property-based, and is applicable to a wide range of biological sequences: small molecules, DNA, ncRNA, mRNA, peptides and proteins. The library offers a number of embedding and property models for biological sequences, as well as diagnostics and visualization functions to inspect the results. seqme can be used to evaluate both one-shot and iterative computational design methods.
Problem

Research questions and friction points this paper is trying to address.

Lack of unified software library for biological sequence design metrics
Need model-agnostic evaluation methods for computational sequence design
Require comprehensive metrics for diverse biological sequence types
Innovation

Methods, ideas, or system contributions that make the work stand out.

Python library for biological sequence evaluation
Model-agnostic metrics for sequence design
Modular open-source implementation of metrics
R
Rasmus Moller-Larsen
Institute of AI for Health, Helmholtz Munich, Ingolstädter Landstraße 1, 85764, Neuherberg, Germany; School of Computation, Information and Technology, Technical University of Munich, Boltzmannstraße 3, 85748, Garching bei München, Germany
A
Adam Izdebski
Institute of AI for Health, Helmholtz Munich, Ingolstädter Landstraße 1, 85764, Neuherberg, Germany; School of Computation, Information and Technology, Technical University of Munich, Boltzmannstraße 3, 85748, Garching bei München, Germany
Jan Olszewski
Jan Olszewski
Student, University of Warsaw
Deep Learning
Pankhil Gawade
Pankhil Gawade
Helmholtz Munich
Gen AI for Drug Design
M
Michal Kmicikiewicz
Institute of AI for Health, Helmholtz Munich, Ingolstädter Landstraße 1, 85764, Neuherberg, Germany; School of Computation, Information and Technology, Technical University of Munich, Boltzmannstraße 3, 85748, Garching bei München, Germany
W
Wojciech Zarzecki
Faculty of Mathematics, Informatics and Mechanics, University of Warsaw, Stefana Banacha 2, 02-097, Warszawa, Poland; Faculty of Electronics and Information Technology, Warsaw University of Technology, Nowowiejska 15/19, 02-097, Warszawa, Poland
Ewa Szczurek
Ewa Szczurek
Associate Professor at University of Warsaw / Institute AI for Health, Helmholtz Zentrum München
computational biologymachine learningartificial intelligence