Scaffold-Conditioned Preference Triplets for Controllable Molecular Optimization with Large Language Models

📅 2026-04-14
📈 Citations: 0
Influential: 0
📄 PDF

career value

193K/year
🤖 AI Summary
Molecular optimization often struggles to simultaneously enhance desired properties while preserving molecular scaffolds, frequently suffering from insufficient chemical validity and limited controllability. This work proposes the Scaffold-Conditioned Preference Triplet (SCPT) framework, which constructs ⟨scaffold, better, worse⟩ triplets through scaffold alignment and chemical rule-based filtering, thereby incorporating chemical priors into preference learning for the first time to fine-tune pretrained molecular large language models. The approach significantly outperforms baseline methods in both single- and multi-objective optimization, achieving controllable property improvements while maintaining high scaffold similarity. Moreover, it demonstrates strong cross-objective generalization, successfully transferring knowledge from single- or dual-objective training to triple-objective tasks and establishing a predictable trade-off frontier between structural similarity and property gain.

Technology Category

Application Category

📝 Abstract
Molecular property optimization is central to drug discovery, yet many deep learning methods rely on black-box scoring and offer limited control over scaffold preservation, often producing unstable or biologically implausible edits. While large language models (LLMs) are promising molecular generators, optimization remains constrained by the lack of chemistry-grounded preference supervision and principled data curation. We introduce \textbf{Scaffold-Conditioned Preference Triplets (SCPT)}, a pipeline that constructs similarity-constrained triplets $\langle\text{scaffold}, \text{better}, \text{worse}\rangle$ via scaffold alignment and chemistry-driven filters for validity, synthesizability, and meaningful property gains. Using these preferences, we align a pretrained molecular LLM as a conditional editor, enabling property-improving edits that retain the scaffold. Across single- and multi-objective benchmarks, SCPT improves optimization success and property gains while maintaining higher scaffold similarity than competitive baselines. Compared with representative non-LLM molecular optimization methods, SCPT-trained LLMs are better suited to scaffold-constrained and multi-objective optimization. In addition, models trained on single-property and two-property supervision generalize effectively to three-property tasks, indicating promising extrapolative generalization under limited higher-order supervision. SCPT also provides controllable data-construction knobs that yield a predictable similarity-gain frontier, enabling systematic adaptation to diverse optimization regimes.
Problem

Research questions and friction points this paper is trying to address.

molecular optimization
scaffold preservation
preference learning
large language models
controllable generation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Scaffold-Conditioned Preference Triplets
molecular optimization
large language models
preference learning
controllable generation
Y
Yi Xiong
National Key Laboratory of Parallel and Distributed Processing, College of Computer Science and Technology, National University of Defense Technology
L
Liang Xiong
National Key Laboratory of Parallel and Distributed Processing, College of Computer Science and Technology, National University of Defense Technology
X
Xiaohong Ji
DP Technology
S
Sen Yang
National Key Laboratory of Parallel and Distributed Processing, College of Computer Science and Technology, National University of Defense Technology
Zhifeng Gao
Zhifeng Gao
DP Technology
Data MiningMachine LearningAI for ScienceAI for Industry
H
Huaimin Wang
National Key Laboratory of Parallel and Distributed Processing, College of Computer Science and Technology, National University of Defense Technology
K
Kele Xu
National Key Laboratory of Parallel and Distributed Processing, College of Computer Science and Technology, National University of Defense Technology