🤖 AI Summary
Current molecular language models are often parameter-heavy yet lack a clear understanding of how chemical grammar rules are learned. This work proposes SMolLM, a weight-sharing Transformer model with only 53K parameters, which achieves 95% validity on ZINC-250K through iterative SMILES generation—outperforming standard GPT models with one-tenth the parameter count. Interpretability analyses reveal that the model incrementally satisfies syntactic constraints in a structured sequence: first handling parentheses, then ring closures, and finally valency rules. Notably, a single attention head is identified as exclusively responsible for bracket matching, offering the first mechanistic insight into how formal linguistic structures can be computed iteratively within transformer-based architectures.
📝 Abstract
Language models for molecular design have scaled to hundreds of millions of parameters, yet how they learn chemical grammar is poorly understood. We train SMolLM, a 53K-parameter weight-shared transformer, to generate novel SMILES with 95% validity on the ZINC-250K drug-like-molecule benchmark, outperforming a standard GPT with 10 times more parameters. Mechanistically, the same block resolves SMILES constraints across passes in a fixed order: brackets first, rings second, and valence last, as shown by error classification, linear probing, and sparse autoencoders. A systematic ablation across attention heads and passes further localizes the first bracket-matching step to a single attention head. Together, these results yield a compact, mechanistically interpretable molecular generator and a testbed for studying iterative computation in formal-language domains.