Boltz is a Strong Baseline for Atom-level Representation Learning

📅 2026-02-02
🏛️ arXiv.org
📈 Citations: 1
Influential: 0
📄 PDF

career value

197K/year
🤖 AI Summary
This work proposes leveraging protein–ligand co-folding information to enhance small-molecule representation learning, addressing the limitations of conventional single-modality pretraining. Building upon the Boltz2 co-folding model, the study introduces representation probing, knowledge distillation, and 3D conformation modeling to demonstrate— for the first time—that co-folding constitutes an effective molecular pretraining paradigm and exhibits complementarity with supervised signals. The resulting method matches or surpasses state-of-the-art models in ADMET prediction, substantially accelerates generative modeling and structure-guided optimization, and enables efficient representation-level supervision in reinforcement learning. These results highlight its strong potential as a plug-and-play foundational model for molecular applications.
📝 Abstract
Foundation models in molecular learning have advanced along two parallel tracks: protein models, which typically utilize evolutionary information to learn amino acid-level representations for folding, and small-molecule models, which focus on learning atom-level representations for property prediction tasks such as ADMET. Notably, cutting-edge protein-centric models such as Boltz now operate at atom-level granularity for protein-ligand co-folding, yet their atom-level expressiveness for small-molecule tasks remains unexplored. A key open question is whether these protein co-folding models capture transferable chemical physics or rely on protein evolutionary signals, which would limit their utility for small-molecule tasks. In this work, we investigate the quality of Boltz atom-level representations across diverse small-molecule benchmarks. Our results show that Boltz is competitive with specialized baselines on ADMET property prediction tasks and effective for molecular generation and optimization. These findings suggest that the representational capacity of cutting-edge protein-centric models has been underexplored and position Boltz as a strong baseline for atom-level representation learning for small molecules.
Problem

Research questions and friction points this paper is trying to address.

co-folding
small-molecule representation
protein-ligand interaction
foundation models
representation learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

co-folding
small-molecule representation learning
Boltz2
representation transfer
molecular foundation model