Fold-CP: A Context Parallelism Framework for Biomolecular Modeling

📅 2026-03-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of predicting structures of large-scale biomolecular assemblies, which is hindered by single-GPU memory limitations and the difficulty of modeling systems with tens of thousands of residues. To overcome this, the authors propose Fold-CP, a multi-GPU context-parallel framework built upon the Boltz architecture. By introducing customized multidimensional parallel primitives, dense triangular updates, and a windowed batched local attention mechanism, Fold-CP achieves unprecedented O(N²/P) memory scaling efficiency. This enables efficient training and inference of ultra-large co-folding models, successfully predicting complex structures exceeding 30,000 residues on 64 NVIDIA B300 GPUs. The method covers over 90% of the CORUM database and, for the first time, fully folds the PI4KA lipid kinase complex—including its intrinsically disordered regions.

Technology Category

Application Category

📝 Abstract
Understanding cellular machinery requires atomic-scale reconstruction of large biomolecular assemblies. However, predicting the structures of these systems has been constrained by hardware memory requirements of models like AlphaFold 3, imposing a practical ceiling of a few thousand residues that can be processed on a single GPU. Here we present NVIDIA BioNeMo Fold-CP, a context parallelism framework that overcomes this barrier by distributing the inference and training pipelines of co-folding models across multiple GPUs. We use the Boltz models as open source reference architectures and implement custom multidimensional primitives that efficiently parallelize both the dense triangular updates and the irregular, data-dependent pattern of window-batched local attention. Our approach achieves efficient memory scaling; for an N-token input distributed across P GPUs, per-device memory scales as $O(N^2/P)$, enabling the structure prediction of assemblies exceeding 30,000 residues on 64 NVIDIA B300 GPUs. We demonstrate the scientific utility of this approach through successful developer use cases: Fold-CP enabled the scoring of over 90% of Comprehensive Resource of Mammalian protein complexes (CORUM) database, as well as folding of disease-relevant PI4KA lipid kinase complex bound to an intrinsically disordered region without cropping. By providing a scalable pathway for modeling massive systems with full global context, Fold-CP represents a significant step toward the realization of a virtual cell.
Problem

Research questions and friction points this paper is trying to address.

biomolecular modeling
memory limitation
large-scale structure prediction
context parallelism
protein complex
Innovation

Methods, ideas, or system contributions that make the work stand out.

context parallelism
co-folding
multidimensional primitives
local attention
memory scaling
D
Dejun Lin
NVIDIA
S
Simon Chu
NVIDIA
V
Vishanth Iyer
NVIDIA
Y
Youhan Lee
NVIDIA
J
John St John
NVIDIA
K
Kevin Boyd
NVIDIA
B
Brian Roland
NVIDIA
Xiaowei Ren
Xiaowei Ren
Senior Deep Learning Architect, NVIDIA
Computer Architecture
Guoqing Zhou
Guoqing Zhou
Guilin University of Technology
Remote sensing
Zhonglin Cao
Zhonglin Cao
Nvidia
Deep LearningMolecular DynamicsNanofluidicsComputational Materials
P
Polina Binder
NVIDIA
Y
Yuliya Zhautouskaya
NVIDIA
Jakub Zakrzewski
Jakub Zakrzewski
professor of physics, Jagiellonian University in Kraków
cold atomsquantum chaosdisordered systemsmany body localizationart
Maximilian Stadler
Maximilian Stadler
Technische Universität München
Machine LearningUncertainty Estimation
K
Kyle Gion
NVIDIA
Y
Yuxing Peng
NVIDIA
X
Xi Chen
NVIDIA
T
Tianjing Zhang
NVIDIA
P
Philipp Junk
Rezo Therapeutics
M
Michelle Dimon
Rezo Therapeutics
P
Paweł Gniewek
Rezo Therapeutics
F
Fabian Ortega
Rezo Therapeutics
M
McKinley Polen
Rezo Therapeutics
I
Ivan Grubisic
Rezo Therapeutics
Ali Bashir
Ali Bashir
Rezo Therapeutics
GenomicsProteomicsComputational BiologyMachine Learning