🤖 AI Summary
To address the insufficient reasoning reliability and interpretability of AI in chemical research, this work proposes a large language model (LLM) enhancement paradigm jointly driven by knowledge graph embedding and formal logical constraints, yielding the open-source KALE-LM-Chem(-1.5) model series. Methodologically, it integrates chemical knowledge graphs, first-order logic rule injection, instruction tuning, and chain-of-thought distillation to enable multi-step scientific reasoning and verifiable conclusion generation. Its key contribution lies in being the first to synergistically incorporate structured logical constraints and semantic knowledge embeddings into LLM training—significantly improving accuracy and interpretability on tasks such as reaction prediction and molecular property inference, consistently outperforming general-purpose baselines. The models and code are publicly released, advancing trustworthy AI for Science.
📝 Abstract
Artificial intelligence is gradually demonstrating its immense potential, and increasing attention is being given to how AI can be harnessed to advance scientific research. In this vision paper, we present our perspectives on how AI can better assist scientific inquiry and explore corresponding technical approach. We have proposed and open-sourced two large models of our KALE-LM model series, KALE-LM-Chem(-1.5), which have achieved outstanding performance in tasks related to the field of chemistry. We hope that our work serves as a strong starting point, helping to realize more intelligent AI and promoting the advancement of human science and technology, as well as societal development.