ChemDFM-R: An Chemical Reasoner LLM Enhanced with Atomized Chemical Knowledge

📅 2025-07-29

📈 Citations: 0

✨ Influential: 0

career value

209K/year

🤖 AI Summary

Large language models (LLMs) exhibit shallow domain understanding, weak reasoning capabilities, and poor interpretability in chemistry. Method: We propose an atomic chemical knowledge representation and hybrid-source distillation framework: (1) constructing a fine-grained, structured atomic chemical knowledge dataset; and (2) integrating expert rule injection, multi-source knowledge distillation (from general corpora and domain-specific texts), and chemistry-aware reinforcement learning to guide the generation of traceable, logically coherent reasoning chains. Contribution/Results: Our approach significantly improves accuracy and transparency in chemical reaction prediction and molecular property inference, achieving state-of-the-art performance across multiple chemical benchmarks. Crucially, the generated reasoning processes support human-in-the-loop verification, ensuring scientific rigor while maintaining practical applicability.

Technology Category

Application Category

📝 Abstract

While large language models (LLMs) have achieved impressive progress, their application in scientific domains such as chemistry remains hindered by shallow domain understanding and limited reasoning capabilities. In this work, we focus on the specific field of chemistry and develop a Chemical Reasoner LLM, ChemDFM-R. We first construct a comprehensive dataset of atomized knowledge points to enhance the model's understanding of the fundamental principles and logical structure of chemistry. Then, we propose a mix-sourced distillation strategy that integrates expert-curated knowledge with general-domain reasoning skills, followed by domain-specific reinforcement learning to enhance chemical reasoning. Experiments on diverse chemical benchmarks demonstrate that ChemDFM-R achieves state-of-the-art performance while providing interpretable, rationale-driven outputs. Further case studies illustrate how explicit reasoning chains significantly improve the reliability, transparency, and practical utility of the model in real-world human-AI collaboration scenarios.

Problem

Research questions and friction points this paper is trying to address.

Enhancing LLMs' shallow understanding in chemistry domain

Improving limited reasoning capabilities for chemical tasks

Bridging gap between general AI and specialized chemical knowledge

Innovation

Methods, ideas, or system contributions that make the work stand out.

Atomized chemical knowledge dataset construction

Mix-sourced distillation strategy integration

Domain-specific reinforcement learning enhancement

🔎 Similar Papers

ChemDFM: A Large Language Foundation Model for Chemistry