Endowing Molecular Language with Geometry Perception via Modality Compensation for High-Throughput Quantum Hamiltonian Prediction

πŸ“… 2026-01-22
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Quantum Hamiltonian prediction typically relies on costly molecular geometry data, limiting high-throughput computation. This work proposes a geometric modality compensation mechanism that endows SMILES-based molecular language models with implicit geometric awareness through multimodal alignment, enabling efficient Hamiltonian prediction without explicit geometric inputs. Combined with weakly supervised fine-tuning, the approach substantially improves data efficiency. Theoretical analysis demonstrates that the method’s generalization error can be effectively bounded. Experimental results show that the proposed method achieves a 100-fold speedup over conventional quantum mechanical approaches while maintaining comparable accuracy, and it has been successfully applied to electrolyte formulation screening.

Technology Category

Application Category

πŸ“ Abstract
The quantum Hamiltonian is a fundamental property that governs a molecule's electronic structure and behavior, and its calculation and prediction are paramount in computational chemistry and materials science. Accurate prediction is highly reliant on extensive training data, including precise molecular geometries and the Hamiltonian matrices, which are expensive to acquire via either experimental or computational methods. Towards a fast yet accurate method for Hamiltonian prediction, we first introduce a geometry information-aware molecular language model to bypass the use of expensive molecular geometries by only using the readily available molecular language -- simplified molecular input line entry system (SMILES). Our method employs multimodal alignment to bridge the relationship between SMILES strings and their corresponding molecular geometries. Recognizing that the molecular language inherently lacks explicit geometric information, we propose a geometry modality compensation strategy to imbue molecular language representations with essential geometric features, thereby enabling accurate predictions using SMILES. In addition, given the high cost of acquiring Hamiltonian data, we devise a weakly supervised strategy to fine-tune the molecular language model, thus improving the data efficiency. Theoretically, we prove that the prediction generalization error without explicit molecular geometry can be bounded through our modality compensation scheme. Empirically, our method achieves superior computational efficiency, providing up to 100x speedup over conventional quantum mechanical methods while maintaining comparable prediction accuracy. We further demonstrate the practical case study of our approach in the screening of electrolyte formulations.
Problem

Research questions and friction points this paper is trying to address.

quantum Hamiltonian prediction
molecular geometry
SMILES
data efficiency
high-throughput screening
Innovation

Methods, ideas, or system contributions that make the work stand out.

geometry modality compensation
molecular language model
quantum Hamiltonian prediction
weakly supervised learning
multimodal alignment
πŸ”Ž Similar Papers
No similar papers found.
Z
Zhenzhong Wang
Department of Artificial Intelligence, Key Laboratory of Multimedia Trusted Perception and Efficient Computing, Ministry of Education of China, Key Laboratory of Digital Protection and Intelligent Processing of Intangible Cultural Heritage of Fujian and Taiwan, Ministry of Culture and Tourism, School of Informatics, Xiamen University, Xiamen 361005, Fujian, P.R. China
Y
Yongjie Hou
School of Electronic Science and Engineering, Xiamen University, Xiamen 361005, Fujian, P.R. China
C
Chenggong Huang
Department of Artificial Intelligence, Key Laboratory of Multimedia Trusted Perception and Efficient Computing, Ministry of Education of China, Key Laboratory of Digital Protection and Intelligent Processing of Intangible Cultural Heritage of Fujian and Taiwan, Ministry of Culture and Tourism, School of Informatics, Xiamen University, Xiamen 361005, Fujian, P.R. China
Yuxuan Du
Yuxuan Du
Nanyang Technological University
Quantum machine learningQuantum computingAI for Quantum Science
Dacheng Tao
Dacheng Tao
Nanyang Technological University
artificial intelligencemachine learningcomputer visionimage processingdata mining
M
Min Jiang
Department of Artificial Intelligence, Key Laboratory of Multimedia Trusted Perception and Efficient Computing, Ministry of Education of China, Key Laboratory of Digital Protection and Intelligent Processing of Intangible Cultural Heritage of Fujian and Taiwan, Ministry of Culture and Tourism, School of Informatics, Xiamen University, Xiamen 361005, Fujian, P.R. China