MiST: Understanding the Role of Mid-Stage Scientific Training in Developing Chemical Reasoning Models

📅 2025-12-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Large language models (LLMs) exhibit limited improvement in chemical reasoning via rule-based online reinforcement learning unless the correct answer is assigned a sufficiently high initial probability—termed “implicit solvability.” Method: We propose the Mid-stage Scientific Training (MiST) framework, comprising SMILES/CIF structure-aware preprocessing, 2.9B-token continual pretraining, and 1B-token supervised fine-tuning, systematically enhancing symbolic manipulation capabilities and implicit chemical knowledge. Contribution/Results: This work formally defines the dual prerequisites for chemical reasoning and quantifies implicit solvability for the first time. Experiments show MiST boosts implicit solvability by 1.8× for 3B/7B models; organic reaction naming Top-1 accuracy rises from 10.9% to 63.9%, and inorganic material generation accuracy increases from 40.6% to 67.4%. Moreover, MiST enables generation of interpretable, stepwise reasoning chains—demonstrating both fidelity and explainability in scientific reasoning.

Technology Category

Application Category

📝 Abstract
Large Language Models can develop reasoning capabilities through online fine-tuning with rule-based rewards. However, recent studies reveal a critical constraint: reinforcement learning succeeds only when the base model already assigns non-negligible probability to correct answers -- a property we term 'latent solvability'. This work investigates the emergence of chemical reasoning capabilities and what these prerequisites mean for chemistry. We identify two necessary conditions for RL-based chemical reasoning: 1) Symbolic competence, and 2) Latent chemical knowledge. We propose mid-stage scientific training (MiST): a set of mid-stage training techniques to satisfy these, including data-mixing with SMILES/CIF-aware pre-processing, continued pre-training on 2.9B tokens, and supervised fine-tuning on 1B tokens. These steps raise the latent-solvability score on 3B and 7B models by up to 1.8x, and enable RL to lift top-1 accuracy from 10.9 to 63.9% on organic reaction naming, and from 40.6 to 67.4% on inorganic material generation. Similar results are observed for other challenging chemical tasks, while producing interpretable reasoning traces. Our results define clear prerequisites for chemical reasoning training and highlight the broader role of mid-stage training in unlocking reasoning capabilities.
Problem

Research questions and friction points this paper is trying to address.

Enabling chemical reasoning in LLMs via mid-stage training
Addressing prerequisites for reinforcement learning in chemistry tasks
Improving latent solvability for organic and inorganic chemical problems
Innovation

Methods, ideas, or system contributions that make the work stand out.

Mid-stage training with SMILES/CIF-aware pre-processing and data-mixing
Continued pre-training on 2.9B tokens and supervised fine-tuning on 1B tokens
Enhancing latent solvability to enable reinforcement learning for chemical reasoning
🔎 Similar Papers
No similar papers found.
A
Andres M Bran
Laboratory of Artificial Chemical Intelligence (LIAC), Institute of Chemical Sciences and Engineering (ISIC), École Polytechnique Fédérale de Lausanne (EPFL), Station 6, Lausanne, CH-1015, Switzerland.
Tong Xie
Tong Xie
Green Dynamics & University of New South Wales
Solar CellsLarge Language ModelsCheminformaticsNano Materials
S
Shai Pranesh
Laboratory of Artificial Chemical Intelligence (LIAC), Institute of Chemical Sciences and Engineering (ISIC), École Polytechnique Fédérale de Lausanne (EPFL), Station 6, Lausanne, CH-1015, Switzerland.
J
Jeffrey Meng
School of Photovoltaic and Renewable Energy engineering, University of New South Wales(UNSW), Kensington, 2033, Australia.
X
Xuan Vu Nguyen
Laboratory of Artificial Chemical Intelligence (LIAC), Institute of Chemical Sciences and Engineering (ISIC), École Polytechnique Fédérale de Lausanne (EPFL), Station 6, Lausanne, CH-1015, Switzerland.
J
Jeremy Goumaz
Laboratory of Artificial Chemical Intelligence (LIAC), Institute of Chemical Sciences and Engineering (ISIC), École Polytechnique Fédérale de Lausanne (EPFL), Station 6, Lausanne, CH-1015, Switzerland.
D
David Ming Segura
Laboratory of Artificial Chemical Intelligence (LIAC), Institute of Chemical Sciences and Engineering (ISIC), École Polytechnique Fédérale de Lausanne (EPFL), Station 6, Lausanne, CH-1015, Switzerland.
R
Ruizhi Xu
School of Computer Science and Engineering, University of New South Wales(UNSW), Kensington, 2033, Australia.
Dongzhan Zhou
Dongzhan Zhou
Researcher at Shanghai AI Lab
AI4Sciencecomputer visiondeep learning
W
Wenjie Zhang
School of Computer Science and Engineering, University of New South Wales(UNSW), Kensington, 2033, Australia.
Bram Hoex
Bram Hoex
Professor, UNSW Sydney
Solar EnergySolar CellsSurface PassivationAdvanced Characterisation
Philippe Schwaller
Philippe Schwaller
Assistant Professor, Laboratory of Artificial Chemical Intelligence - EPFL
Deep LearningML for ChemistryReaction PredictionSynthesis PlanningAccelerated Discovery