Amortized Molecular Optimization via Group Relative Policy Optimization

📅 2026-02-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing molecular optimization methods are predominantly instance-based optimizers, suffering from limited generalization and high computational costs. This work proposes a serialized molecular generation framework built upon a pretrained graph Transformer, integrated with a novel Group Relative Policy Optimization (GRPO) algorithm. During reinforcement learning fine-tuning, GRPO mitigates the variance in policy learning caused by differing difficulties of starting molecular structures through reward normalization relative to initial molecules. The approach achieves efficient and transferable multi-objective property optimization on out-of-distribution molecular scaffolds without requiring oracle calls or post-processing during inference, matching the performance of state-of-the-art instance optimizers.

Technology Category

Application Category

📝 Abstract
Molecular design encompasses tasks ranging from de-novo design to structural alteration of given molecules or fragments. For the latter, state-of-the-art methods predominantly function as"Instance Optimizers'', expending significant compute restarting the search for every input structure. While model-based approaches theoretically offer amortized efficiency by learning a policy transferable to unseen structures, existing methods struggle to generalize. We identify a key failure mode: the high variance arising from the heterogeneous difficulty of distinct starting structures. To address this, we introduce GRXForm, adapting a pre-trained Graph Transformer model that optimizes molecules via sequential atom-and-bond additions. We employ Group Relative Policy Optimization (GRPO) for goal-directed fine-tuning to mitigate variance by normalizing rewards relative to the starting structure. Empirically, GRXForm generalizes to out-of-distribution molecular scaffolds without inference-time oracle calls or refinement, achieving scores in multi-objective optimization competitive with leading instance optimizers.
Problem

Research questions and friction points this paper is trying to address.

molecular design
policy generalization
reward variance
out-of-distribution generalization
goal-directed optimization
Innovation

Methods, ideas, or system contributions that make the work stand out.

Amortized optimization
Group Relative Policy Optimization
Molecular design
Graph Transformer
Out-of-distribution generalization
🔎 Similar Papers
No similar papers found.
M
Muhammad bin Javaid
RWTH Aachen University, Department of Computer Science, Ahornstrasse 55, 52074 Aachen, Germany
H
Hasham Hussain
RWTH Aachen University, Department of Computer Science, Ahornstrasse 55, 52074 Aachen, Germany; Alfred E. Tiefenbacher (GmbH & Co. KG), Van-der-Smissen-Strasse 1, 22767 Hamburg, Germany
A
Ashima Khanna
Technical University of Munich, TUM Campus Straubing for Biotechnology and Sustainability, Bioinformatics, Petersgasse 18, 94315 Straubing, Germany; University of Applied Sciences Weihenstephan-Triesdorf, Bioinformatics, Petersgasse 18, 94315 Straubing, Germany
B
Berke Kisin
RWTH Aachen University, Department of Computer Science, Ahornstrasse 55, 52074 Aachen, Germany
J
Jonathan Pirnay
Technical University of Munich, TUM Campus Straubing for Biotechnology and Sustainability, Bioinformatics, Petersgasse 18, 94315 Straubing, Germany; University of Applied Sciences Weihenstephan-Triesdorf, Bioinformatics, Petersgasse 18, 94315 Straubing, Germany
Alexander Mitsos
Alexander Mitsos
AVT Systemverfahrenstechnik, RWTH Aachen University and Energy Systems Engineering IEK-10
process systems engineeringenergy systemsglobal optimizationbilevel optimizationprocess
Dominik G. Grimm
Dominik G. Grimm
Professor, TUM Campus Straubing, HSWT
BioinformaticsMachine LearningData ScienceComputational BiotechnologyPrecision Agriculture
M
Martin Grohe
RWTH Aachen University, Department of Computer Science, Ahornstrasse 55, 52074 Aachen, Germany