seqme: a Python library for evaluating biological sequence design

📅 2025-11-06

📈 Citations: 0

✨ Influential: 0

career value

216K/year

🤖 AI Summary

Current biomolecular sequence design methods lack unified, reproducible evaluation standards, hindering fair and rigorous performance comparison. To address this, we introduce BioSeqEval—a modular, open-source Python evaluation library that systematically integrates three model-agnostic metric categories: sequence-based, embedding-based, and property-based—representing the first such comprehensive framework. It supports one-shot and iterative design evaluation across diverse sequence modalities, including small molecules, DNA, RNA, peptides, and proteins. The library incorporates state-of-the-art pretrained embedding models, machine learning–based property predictors, efficient sequence alignment tools, and interactive visualization modules for diagnostic analysis. Empirical evaluation demonstrates that BioSeqEval significantly enhances evaluation standardization, cross-method comparability, and methodological transparency. It exhibits strong flexibility and robustness across multiple benchmark design tasks, enabling reproducible, interpretable, and scalable assessment of generative sequence models.

Technology Category

Application Category

📝 Abstract

Recent advances in computational methods for designing biological sequences have sparked the development of metrics to evaluate these methods performance in terms of the fidelity of the designed sequences to a target distribution and their attainment of desired properties. However, a single software library implementing these metrics was lacking. In this work we introduce seqme, a modular and highly extendable open-source Python library, containing model-agnostic metrics for evaluating computational methods for biological sequence design. seqme considers three groups of metrics: sequence-based, embedding-based, and property-based, and is applicable to a wide range of biological sequences: small molecules, DNA, ncRNA, mRNA, peptides and proteins. The library offers a number of embedding and property models for biological sequences, as well as diagnostics and visualization functions to inspect the results. seqme can be used to evaluate both one-shot and iterative computational design methods.

Problem

Research questions and friction points this paper is trying to address.

Lack of unified software library for biological sequence design metrics

Need model-agnostic evaluation methods for computational sequence design

Require comprehensive metrics for diverse biological sequence types

Innovation

Methods, ideas, or system contributions that make the work stand out.

Python library for biological sequence evaluation

Model-agnostic metrics for sequence design

Modular open-source implementation of metrics

🔎 Similar Papers

A Framework for Standardizing Similarity Measures in a Rapidly Evolving Field