Unveiling Latent Knowledge in Chemistry Language Models through Sparse Autoencoders

📅 2025-12-08
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Chemical language models (CLMs) remain largely opaque “black boxes,” hindering understanding of how they internally represent chemical knowledge. Method: To address this, we introduce sparse autoencoders (SAEs)—a first application in CLM interpretability—to systematically decode hidden-layer activation patterns of the foundational model SMI-TED across diverse molecular datasets. Contribution/Results: Our approach successfully identifies numerous semantically coherent, interpretable features—including molecular structural motifs, physicochemical properties, and pharmacological categories—establishing a direct, human-understandable mapping between neural activations and domain-specific chemical concepts. This confirms CLMs’ intrinsic capacity to encode established chemical priors and yields the first general-purpose knowledge discovery framework tailored for chemistry AI. By bridging neural representations with expert-validated chemical semantics, our work establishes a novel paradigm for developing trustworthy, interpretable large-scale chemical models.

Technology Category

Application Category

📝 Abstract
Since the advent of machine learning, interpretability has remained a persistent challenge, becoming increasingly urgent as generative models support high-stakes applications in drug and material discovery. Recent advances in large language model (LLM) architectures have yielded chemistry language models (CLMs) with impressive capabilities in molecular property prediction and molecular generation. However, how these models internally represent chemical knowledge remains poorly understood. In this work, we extend sparse autoencoder techniques to uncover and examine interpretable features within CLMs. Applying our methodology to the Foundation Models for Materials (FM4M) SMI-TED chemistry foundation model, we extract semantically meaningful latent features and analyse their activation patterns across diverse molecular datasets. Our findings reveal that these models encode a rich landscape of chemical concepts. We identify correlations between specific latent features and distinct domains of chemical knowledge, including structural motifs, physicochemical properties, and pharmacological drug classes. Our approach provides a generalisable framework for uncovering latent knowledge in chemistry-focused AI systems. This work has implications for both foundational understanding and practical deployment; with the potential to accelerate computational chemistry research.
Problem

Research questions and friction points this paper is trying to address.

Uncover interpretable features in chemistry language models
Analyze activation patterns across diverse molecular datasets
Identify correlations between latent features and chemical knowledge domains
Innovation

Methods, ideas, or system contributions that make the work stand out.

Sparse autoencoders extract interpretable latent features
Analyze activation patterns across diverse molecular datasets
Correlate latent features with chemical knowledge domains
🔎 Similar Papers
No similar papers found.
Jaron Cohen
Jaron Cohen
MSc Advanced Computer Science @ University of Oxford
A
Alexander G. Hasson
Department of Oncology, University of Oxford
S
Sara Tanovic
Department of Chemistry, University of Oxford