Transformers Discover Molecular Structure Without Graph Priors

📅 2025-10-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Conventional graph neural networks (GNNs) rely on predefined graph structures (e.g., k-nearest neighbors), suffering from fixed receptive fields and sparse, structure-dependent computation—limiting both expressive power and inference efficiency. Method: We propose a graph-free, physics-agnostic Transformer model that directly takes atomic Cartesian coordinates as input and jointly predicts molecular potential energy and atomic forces in an end-to-end manner. Leveraging equivariant architecture design and coordinate scaling, the model implicitly learns physically plausible attention patterns—where attention weights decay with interatomic distance and adapt to diverse molecular environments. Results: On OMol25, our model achieves energy and force prediction accuracy on par with state-of-the-art equivariant GNNs, with performance scaling robustly with increased training resources. This work provides the first systematic empirical validation of standard Transformers for graph-free molecular modeling, establishing a new paradigm for scalable, structure-agnostic molecular machine learning.

Technology Category

Application Category

📝 Abstract
Graph Neural Networks (GNNs) are the dominant architecture for molecular machine learning, particularly for molecular property prediction and machine learning interatomic potentials (MLIPs). GNNs perform message passing on predefined graphs often induced by a fixed radius cutoff or k-nearest neighbor scheme. While this design aligns with the locality present in many molecular tasks, a hard-coded graph can limit expressivity due to the fixed receptive field and slows down inference with sparse graph operations. In this work, we investigate whether pure, unmodified Transformers trained directly on Cartesian coordinates$unicode{x2013}$without predefined graphs or physical priors$unicode{x2013}$can approximate molecular energies and forces. As a starting point for our analysis, we demonstrate how to train a Transformer to competitive energy and force mean absolute errors under a matched training compute budget, relative to a state-of-the-art equivariant GNN on the OMol25 dataset. We discover that the Transformer learns physically consistent patterns$unicode{x2013}$such as attention weights that decay inversely with interatomic distance$unicode{x2013}$and flexibly adapts them across different molecular environments due to the absence of hard-coded biases. The use of a standard Transformer also unlocks predictable improvements with respect to scaling training resources, consistent with empirical scaling laws observed in other domains. Our results demonstrate that many favorable properties of GNNs can emerge adaptively in Transformers, challenging the necessity of hard-coded graph inductive biases and pointing toward standardized, scalable architectures for molecular modeling.
Problem

Research questions and friction points this paper is trying to address.

Investigating if Transformers can approximate molecular energies without predefined graphs
Challenging the necessity of hard-coded graph inductive biases in molecular modeling
Demonstrating Transformers can learn physical patterns adaptively from coordinates
Innovation

Methods, ideas, or system contributions that make the work stand out.

Transformers trained directly on Cartesian coordinates
Learns attention patterns decaying with atomic distance
Eliminates predefined molecular graph inductive biases
🔎 Similar Papers
No similar papers found.
T
Tobias Kreiman
UC Berkeley
Yutong Bai
Yutong Bai
Postdoc, UC Berkeley
Artificial IntelligenceComputer VisionDeep Learning
F
Fadi Atieh
UC Berkeley
E
Elizabeth Weaver
UC Berkeley
Eric Qu
Eric Qu
PhD Student, UC Berkeley
Geometric Deep LearningAI for ScienceSequence Modeling
A
Aditi S. Krishnapriyan
UC Berkeley, LBNL