Discovery of Interpretable Physical Laws in Materials via Language-Model-Guided Symbolic Regression

📅 2026-02-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge that traditional symbolic regression often yields overly complex and physically meaningless expressions when applied to high-dimensional materials data. To overcome this limitation, the authors propose a novel approach that, for the first time, integrates large language models (LLMs) into the symbolic regression pipeline. By leveraging the embedded scientific knowledge within LLMs to guide the search direction and combining it with combinatorial optimization-based pruning techniques, the method substantially reduces the solution space while ensuring the physical interpretability of the resulting formulas. The approach efficiently discovers new analytical expressions for key material properties—including perovskite bulk modulus, bandgap, and oxygen evolution reaction activity—that consistently outperform existing results in terms of accuracy, simplicity, and physical plausibility.

Technology Category

Application Category

📝 Abstract
Discovering interpretable physical laws from high-dimensional data is a fundamental challenge in scientific research. Traditional methods, such as symbolic regression, often produce complex, unphysical formulas when searching a vast space of possible forms. We introduce a framework that guides the search process by leveraging the embedded scientific knowledge of large language models, enabling efficient identification of physical laws in the data. We validate our approach by modeling key properties of perovskite materials. Our method mitigates the combinatorial explosion commonly encountered in traditional symbolic regression, reducing the effective search space by a factor of approximately $10^5$. A set of novel formulas for bulk modulus, band gap, and oxygen evolution reaction activity are identified, which not only provide meaningful physical insights but also outperform previous formulas in accuracy and simplicity.
Problem

Research questions and friction points this paper is trying to address.

symbolic regression
interpretable physical laws
high-dimensional data
materials science
combinatorial explosion
Innovation

Methods, ideas, or system contributions that make the work stand out.

language-model-guided symbolic regression
interpretable physical laws
perovskite materials
combinatorial explosion mitigation
scientific discovery
Y
Yifeng Guan
Shanghai Artificial Intelligence Laboratory, Shanghai, 200232, China.
C
Chuyi Liu
Shanghai Artificial Intelligence Laboratory, Shanghai, 200232, China.
Dongzhan Zhou
Dongzhan Zhou
Researcher at Shanghai AI Lab
AI4Sciencecomputer visiondeep learning
Lei Bai
Lei Bai
Shanghai AI Laboratory
Foundation ModelScience IntelligenceMulti-Agent SystemAutonomous Discovery
W
Wan-jian Yin
Soochow Institute for Energy and Materials Innovations (SIEMIS), Soochow University, Suzhou, 215006, China.
Jingyuan Li
Jingyuan Li
University of Washington
Mao Su
Mao Su
Shanghai AI Laboratory
PhysicsAI