Training a Scientific Reasoning Model for Chemistry

📅 2025-06-04
🏛️ arXiv.org
📈 Citations: 13
Influential: 2
📄 PDF
🤖 AI Summary
Existing chemical language models typically require domain-specific pretraining, limiting data efficiency and generalizability in reasoning across diverse experimental tasks. Method: We propose a novel paradigm for building high-performance chemical reasoning models via post-training only—eliminating the need for domain-specific pretraining. Leveraging the Mistral-Small-24B architecture, we apply reinforcement learning–based chain-of-thought fine-tuning on over 640,000 experimentally annotated chemistry problems, enabling joint natural-language and SMILES-based structural reasoning across 375 experiment-driven tasks—including synthetic feasibility, pharmacokinetics, receptor activity, and odor prediction. Contribution/Results: This work achieves, for the first time, zero-domain-pretraining chemical reasoning modeling. Our data efficiency exceeds that of specialized models by over one order of magnitude. The resulting model, ether0, outperforms state-of-the-art general-purpose and multimodal chemical foundation models—and even human experts—on molecular design benchmarks.

Technology Category

Application Category

📝 Abstract
Reasoning models are large language models that emit a long chain-of-thought before answering, providing both higher accuracy and explicit reasoning for their response. A major question has been whether language model reasoning generalizes beyond mathematics, programming, and logic, where most previous work has focused. We demonstrate that reasoning models can be post-trained for chemistry without additional domain pretraining, and require substantially less data compared to contemporary domain-specific models. We report ether0, a 24B parameter LLM (based on Mistral-Small-24B) that can reason in natural language and respond with chemical structures. This reasoning model was trained with reinforcement learning on 640,730 experimentally-grounded chemistry problems across 375 tasks ranging from synthesizability, to blood-brain barrier permeability, to human receptor activity, to scent. Our model exceeds general-purpose chemistry models, frontier models, and human experts on molecular design tasks. It is also more data efficient relative to specialized models. We anticipate that this method can be applied to train data-efficient language models specialized for tasks across a wide variety of scientific domains.
Problem

Research questions and friction points this paper is trying to address.

Trains a reasoning model for chemistry without domain pretraining
Addresses generalization of reasoning models beyond math and logic
Improves data efficiency for specialized scientific domain tasks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Post-trained reasoning model for chemistry without domain pretraining
Reinforcement learning on 640,730 chemistry problems across 375 tasks
Ether0 model exceeds general-purpose and specialized models in molecular design
🔎 Similar Papers
No similar papers found.