🤖 AI Summary
This work introduces the first open-source Direct Coupling Analysis (DCA) framework supporting multitask modeling of proteins and RNA, unifying key challenges including contact prediction, mutation effect estimation, sequence library scoring, and de novo sequence design. Methodologically, it integrates Boltzmann machine modeling with maximum-likelihood estimation and L₂ regularization, and pioneers a unified cross-language (C++/Julia/Python) and cross-hardware (CPU/GPU) interface. It natively supports both dense and sparse learning protocols and enables end-to-end execution of downstream tasks. Experiments demonstrate state-of-the-art performance: >70% Top-L/5 contact prediction accuracy on standard protein family benchmarks; scalable training on sequence libraries exceeding ten million sequences; and over 10× throughput acceleration on GPU-accelerated hardware compared to CPU-only execution.
📝 Abstract
In this methods article, we provide a flexible but easy-to-use implementation of Direct Coupling Analysis (DCA) based on Boltzmann machine learning, together with a tutorial on how to use it. The package exttt{adabmDCA 2.0} is available in different programming languages (C++, Julia, Python) usable on different architectures (single-core and multi-core CPU, GPU) using a common front-end interface. In addition to several learning protocols for dense and sparse generative DCA models, it allows to directly address common downstream tasks like residue-residue contact prediction, mutational-effect prediction, scoring of sequence libraries and generation of artificial sequences for sequence design. It is readily applicable to protein and RNA sequence data.