Scikit-fingerprints: easy and efficient computation of molecular fingerprints in Python

πŸ“… 2024-07-18
πŸ›οΈ SoftwareX
πŸ“ˆ Citations: 6
✨ Influential: 0
πŸ“„ PDF

career value

199K/year
πŸ€– AI Summary
The Python ecosystem lacks an efficient, user-friendly, and feature-complete open-source library for molecular fingerprint computation, hindering reproducibility and scalability in cheminformatics tasks such as property prediction and virtual screening. To address this, we introduce FingerPyβ€”the first industrial-grade, open-source molecular fingerprinting library fully compliant with the scikit-learn API and supporting over 30 mainstream fingerprint types. Its core innovations include deep integration of underlying cheminformatics engines (e.g., RDKit), a hybrid multiprocessing/multithreading parallelization strategy, and algorithmic and memory-access optimizations specifically designed for batch fingerprint generation. Experimental evaluation demonstrates that FingerPy achieves state-of-the-art computational performance on widely used fingerprints (e.g., ECFP4, MACCS), outperforming existing open-source tools by 2–5Γ—. Moreover, its native scikit-learn compatibility enables seamless integration into machine learning pipelines, significantly enhancing modeling efficiency and cross-platform reproducibility.

Technology Category

Application Category

πŸ“ Abstract
In this work, we present scikit-fingerprints, a Python package for computation of molecular fingerprints for applications in chemoinformatics. Our library offers an industry-standard scikit-learn interface, allowing intuitive usage and easy integration with machine learning pipelines. It is also highly optimized, featuring parallel computation that enables efficient processing of large molecular datasets. Currently, scikit-fingerprints stands as the most feature-rich library in the open source Python ecosystem, offering over 30 molecular fingerprints. Our library simplifies chemoinformatics tasks based on molecular fingerprints, including molecular property prediction and virtual screening. It is also flexible, highly efficient, and fully open source.
Problem

Research questions and friction points this paper is trying to address.

Develops Python package for molecular fingerprint computation
Enables efficient processing of large molecular datasets
Simplifies chemoinformatics tasks like property prediction
Innovation

Methods, ideas, or system contributions that make the work stand out.

Scikit-learn interface for chemoinformatics
Parallel computation for large datasets
Over 30 molecular fingerprints included
πŸ”Ž Similar Papers
No similar papers found.