SMolLM: Small Language Models Learn Small Molecular Grammar

📅 2026-05-07
📈 Citations: 0
Influential: 0
📄 PDF

career value

197K/year
🤖 AI Summary
Current molecular language models are often parameter-heavy yet lack a clear understanding of how chemical grammar rules are learned. This work proposes SMolLM, a weight-sharing Transformer model with only 53K parameters, which achieves 95% validity on ZINC-250K through iterative SMILES generation—outperforming standard GPT models with one-tenth the parameter count. Interpretability analyses reveal that the model incrementally satisfies syntactic constraints in a structured sequence: first handling parentheses, then ring closures, and finally valency rules. Notably, a single attention head is identified as exclusively responsible for bracket matching, offering the first mechanistic insight into how formal linguistic structures can be computed iteratively within transformer-based architectures.
📝 Abstract
Language models for molecular design have scaled to hundreds of millions of parameters, yet how they learn chemical grammar is poorly understood. We train SMolLM, a 53K-parameter weight-shared transformer, to generate novel SMILES with 95% validity on the ZINC-250K drug-like-molecule benchmark, outperforming a standard GPT with 10 times more parameters. Mechanistically, the same block resolves SMILES constraints across passes in a fixed order: brackets first, rings second, and valence last, as shown by error classification, linear probing, and sparse autoencoders. A systematic ablation across attention heads and passes further localizes the first bracket-matching step to a single attention head. Together, these results yield a compact, mechanistically interpretable molecular generator and a testbed for studying iterative computation in formal-language domains.
Problem

Research questions and friction points this paper is trying to address.

molecular design
chemical grammar
language models
SMILES
interpretability
Innovation

Methods, ideas, or system contributions that make the work stand out.

small language models
molecular grammar
SMILES generation
interpretable attention
iterative computation