Hierarchical Structure-Property Alignment for Data-Efficient Molecular Generation and Editing

📅 2025-11-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In AI-driven drug discovery, molecular generation and editing face two key challenges: (1) complex structure–property relationship modeling and (2) sparse, incomplete multi-attribute annotations. To address these, we propose a data-efficient hierarchical alignment framework that jointly aligns structural representations with property labels across atomic, substructural, and molecular granularities. We introduce scaffold-based clustering coupled with an auxiliary variational autoencoder to identify representative and challenging samples. Furthermore, we design an attribute-correlation-aware masking mechanism and diversified perturbation strategies to strengthen cross-modal alignment between SMILES strings and multi-attribute labels. Our method significantly reduces reliance on large-scale annotated datasets and enables high-quality, multi-attribute-constrained molecular generation and controllable editing under few-shot settings. Extensive evaluation on two real-world drug discovery tasks demonstrates its effectiveness and practical utility.

Technology Category

Application Category

📝 Abstract
Property-constrained molecular generation and editing are crucial in AI-driven drug discovery but remain hindered by two factors: (i) capturing the complex relationships between molecular structures and multiple properties remains challenging, and (ii) the narrow coverage and incomplete annotations of molecular properties weaken the effectiveness of property-based models. To tackle these limitations, we propose HSPAG, a data-efficient framework featuring hierarchical structure-property alignment. By treating SMILES and molecular properties as complementary modalities, the model learns their relationships at atom, substructure, and whole-molecule levels. Moreover, we select representative samples through scaffold clustering and hard samples via an auxiliary variational auto-encoder (VAE), substantially reducing the required pre-training data. In addition, we incorporate a property relevance-aware masking mechanism and diversified perturbation strategies to enhance generation quality under sparse annotations. Experiments demonstrate that HSPAG captures fine-grained structure-property relationships and supports controllable generation under multiple property constraints. Two real-world case studies further validate the editing capabilities of HSPAG.
Problem

Research questions and friction points this paper is trying to address.

Capturing complex molecular structure-property relationships remains challenging
Narrow coverage and incomplete annotations weaken property-based models
Data-efficient molecular generation under multiple property constraints is needed
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hierarchical alignment of molecular structures and properties
Scaffold clustering and VAE for sample selection
Property-aware masking and perturbation for sparse annotations
🔎 Similar Papers
No similar papers found.
Z
Ziyu Fan
School of Computer Science and Engineering, Central South University, Changsha 410083, China
Zhijian Huang
Zhijian Huang
Biochemistry Department and Beckman Institute, University of Illinois at Urbana-Champaigh
modeling and simulationquantum chemistrymembrane transporters and channels
Y
Yahan Li
School of Computer Science and Engineering, Central South University, Changsha 410083, China
X
Xiaowen Hu
Siyuan Shen
Siyuan Shen
School of Information Science and Technology, ShanghaiTech University
Computer visionComputational photography
Y
Yunliang Wang
School of Computer Science and Engineering, Central South University, Changsha 410083, China
Z
Zeyu Zhong
School of Computer Science and Engineering, Central South University, Changsha 410083, China
Shuhong Liu
Shuhong Liu
The University of Tokyo
3DVAI4SRobotics
S
Shuning Yang
School of Computer Science and Engineering, Central South University, Changsha 410083, China
S
Shangqian Wu
School of Computer Science and Engineering, Central South University, Changsha 410083, China
Min Wu
Min Wu
Professor, IEEE Fellow, China University of Geosciences
Process controlRobust controlIntelligent systems
L
Lei Deng
School of Computer Science and Engineering, Central South University, Changsha 410083, China