Evaluating Effects of Augmented SELFIES for Molecular Understanding Using QK-LSTM

📅 2025-04-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Early identification of molecular adverse effects is critical in drug discovery, yet conventional molecular representations—such as SMILES—exhibit limitations in robustness and generalizability. To address this, we propose an enhanced SELFIES representation and systematically evaluate its performance in both classical and quantum-classical hybrid models. We innovatively integrate data augmentation into SELFIES and couple it with a quantum kernel-enhanced LSTM (QK-LSTM) to construct an end-to-end molecular representation learning framework. Experimental results demonstrate that augmented SELFIES outperforms augmented SMILES by 5.97% (classical LSTM) and 5.91% (QK-LSTM) on molecular property and adverse effect prediction tasks, with statistically significant improvements. This work establishes a scalable, sequence-based modeling paradigm for quantum-enhanced molecular AI and advances reliable, interpretable early-stage screening of drug candidates.

Technology Category

Application Category

📝 Abstract
Identifying molecular properties, including side effects, is a critical yet time-consuming step in drug development. Failing to detect these side effects before regulatory submission can result in significant financial losses and production delays, and overlooking them during the regulatory review can lead to catastrophic consequences. This challenge presents an opportunity for innovative machine learning approaches, particularly hybrid quantum-classical models like the Quantum Kernel-Based Long Short-Term Memory (QK-LSTM) network. The QK-LSTM integrates quantum kernel functions into the classical LSTM framework, enabling the capture of complex, non-linear patterns in sequential data. By mapping input data into a high-dimensional quantum feature space, the QK-LSTM model reduces the need for large parameter sets, allowing for model compression without sacrificing accuracy in sequence-based tasks. Recent advancements have been made in the classical domain using augmented variations of the Simplified Molecular Line-Entry System (SMILES). However, to the best of our knowledge, no research has explored the impact of augmented SMILES in the quantum domain, nor the role of augmented Self-Referencing Embedded Strings (SELFIES) in either classical or hybrid quantum-classical settings. This study presents the first analysis of these approaches, providing novel insights into their potential for enhancing molecular property prediction and side effect identification. Results reveal that augmenting SELFIES yields in statistically significant improvements from SMILES by a 5.97% improvement for the classical domain and a 5.91% improvement for the hybrid quantum-classical domain.
Problem

Research questions and friction points this paper is trying to address.

Identifying molecular properties and side effects efficiently
Exploring augmented SELFIES in quantum-classical molecular modeling
Improving molecular prediction accuracy with hybrid QK-LSTM
Innovation

Methods, ideas, or system contributions that make the work stand out.

QK-LSTM integrates quantum kernel with LSTM
Augmented SELFIES improves molecular prediction accuracy
Quantum feature space reduces parameter needs
🔎 Similar Papers