Fold-CP: A Context Parallelism Framework for Biomolecular Modeling

📅 2026-03-16

📈 Citations: 0

✨ Influential: 0

career value

225K/year

🤖 AI Summary

This work addresses the challenge of predicting structures of large-scale biomolecular assemblies, which is hindered by single-GPU memory limitations and the difficulty of modeling systems with tens of thousands of residues. To overcome this, the authors propose Fold-CP, a multi-GPU context-parallel framework built upon the Boltz architecture. By introducing customized multidimensional parallel primitives, dense triangular updates, and a windowed batched local attention mechanism, Fold-CP achieves unprecedented O(N²/P) memory scaling efficiency. This enables efficient training and inference of ultra-large co-folding models, successfully predicting complex structures exceeding 30,000 residues on 64 NVIDIA B300 GPUs. The method covers over 90% of the CORUM database and, for the first time, fully folds the PI4KA lipid kinase complex—including its intrinsically disordered regions.

Technology Category

Application Category

📝 Abstract

Understanding cellular machinery requires atomic-scale reconstruction of large biomolecular assemblies. However, predicting the structures of these systems has been constrained by hardware memory requirements of models like AlphaFold 3, imposing a practical ceiling of a few thousand residues that can be processed on a single GPU. Here we present NVIDIA BioNeMo Fold-CP, a context parallelism framework that overcomes this barrier by distributing the inference and training pipelines of co-folding models across multiple GPUs. We use the Boltz models as open source reference architectures and implement custom multidimensional primitives that efficiently parallelize both the dense triangular updates and the irregular, data-dependent pattern of window-batched local attention. Our approach achieves efficient memory scaling; for an N-token input distributed across P GPUs, per-device memory scales as $O(N^2/P)$, enabling the structure prediction of assemblies exceeding 30,000 residues on 64 NVIDIA B300 GPUs. We demonstrate the scientific utility of this approach through successful developer use cases: Fold-CP enabled the scoring of over 90% of Comprehensive Resource of Mammalian protein complexes (CORUM) database, as well as folding of disease-relevant PI4KA lipid kinase complex bound to an intrinsically disordered region without cropping. By providing a scalable pathway for modeling massive systems with full global context, Fold-CP represents a significant step toward the realization of a virtual cell.

Problem

Research questions and friction points this paper is trying to address.

biomolecular modeling

memory limitation

large-scale structure prediction

context parallelism

protein complex

Innovation

Methods, ideas, or system contributions that make the work stand out.

context parallelism

co-folding

multidimensional primitives