π€ AI Summary
The Python ecosystem lacks an efficient, user-friendly, and feature-complete open-source library for molecular fingerprint computation, hindering reproducibility and scalability in cheminformatics tasks such as property prediction and virtual screening. To address this, we introduce FingerPyβthe first industrial-grade, open-source molecular fingerprinting library fully compliant with the scikit-learn API and supporting over 30 mainstream fingerprint types. Its core innovations include deep integration of underlying cheminformatics engines (e.g., RDKit), a hybrid multiprocessing/multithreading parallelization strategy, and algorithmic and memory-access optimizations specifically designed for batch fingerprint generation. Experimental evaluation demonstrates that FingerPy achieves state-of-the-art computational performance on widely used fingerprints (e.g., ECFP4, MACCS), outperforming existing open-source tools by 2β5Γ. Moreover, its native scikit-learn compatibility enables seamless integration into machine learning pipelines, significantly enhancing modeling efficiency and cross-platform reproducibility.
π Abstract
In this work, we present scikit-fingerprints, a Python package for computation of molecular fingerprints for applications in chemoinformatics. Our library offers an industry-standard scikit-learn interface, allowing intuitive usage and easy integration with machine learning pipelines. It is also highly optimized, featuring parallel computation that enables efficient processing of large molecular datasets. Currently, scikit-fingerprints stands as the most feature-rich library in the open source Python ecosystem, offering over 30 molecular fingerprints. Our library simplifies chemoinformatics tasks based on molecular fingerprints, including molecular property prediction and virtual screening. It is also flexible, highly efficient, and fully open source.