🤖 AI Summary
This work addresses key limitations in molecular optimization driven by natural language prompts—namely, constrained data scalability, chemical hallucinations, and neglect of fragment-level contextual dependencies—by introducing FORGE, a two-stage framework that reframes optimization as a context-aware local editing task. In the first stage, candidate fragments are ranked according to their contribution to desired molecular properties based on holistic molecular context; the second stage executes explicit fragment replacements. FORGE eliminates reliance on manual text annotations by leveraging automatically mined low-to-high fidelity edit pairs for fragment-level supervision, establishing a scalable, hallucination-free training paradigm. Moreover, it adapts to unknown black-box objectives through contextual examples. Built upon a 0.6B-parameter language model and integrating automatic edit pairs, context-aware mechanisms, and few-shot learning, FORGE consistently outperforms existing approaches—including larger language models and graph neural networks—on Prompt-MolOpt, PMO-1k, and ChemCoTBench benchmarks.
📝 Abstract
Molecular optimization seeks to improve a molecule through small structural edits while preserving similarity to the starting compound. Recent language-model approaches typically treat this task as prompt-conditioned sequence generation. However, relying on natural language introduces an inherent data-scaling bottleneck, often leads to chemical hallucinations, and ignores the strong context dependence of fragment effects. We present FORGE, a two-stage framework that reformulates molecular optimization as context-aware local editing. By utilizing automatically mined, verified low-to-high edit pairs instead of expensive human text annotations, Stage 1 ranks candidate fragments by their property contribution under the full molecular context to inject chemical prior, and Stage 2 generates explicit fragment replacements. Built on a compact 0.6B language model, FORGE further adapts to unseen black-box objectives through in-context demonstrations. Across Prompt-MolOpt, PMO-1k and ChemCoTBench, FORGE consistently outperforms prior methods, including substantially larger language models and graph methods. These results highlight the value of explicit fragment-level supervision as a more easily obtainable, scalable, and hallucination-less alternative to natural language training.