GEOM-Drugs Revisited: Toward More Chemically Accurate Benchmarks for 3D Molecule Generation

📅 2025-04-30
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current 3D molecular generation research widely relies on the GEOM-Drugs dataset for evaluation, yet its preprocessing suffers from severe chemical inaccuracies—including erroneous valence bond assignments, miscalculated bond orders, and inconsistent classical force-field parameterization misaligned with reference structures—compromising chemical validity of evaluation metrics. Method: We systematically diagnose and rectify these flaws by introducing a “chemical consistency–first” evaluation framework: (i) reconstructing valence rules per IUPAC standards; (ii) replacing empirical force fields with GFN2-xTB quantum-chemical geometry optimization and energy computation to establish chemically rigorous ground-truth benchmarks; and (iii) implementing a molecular topology validation and graph-rule modeling pipeline. Contribution/Results: Re-evaluating state-of-the-art generative models under this framework reveals substantial overestimation of prior performance metrics. We publicly release corrected data protocols and evaluation scripts to foster community-wide adoption of chemically sound 3D molecular generation assessment standards.

Technology Category

Application Category

📝 Abstract
Deep generative models have shown significant promise in generating valid 3D molecular structures, with the GEOM-Drugs dataset serving as a key benchmark. However, current evaluation protocols suffer from critical flaws, including incorrect valency definitions, bugs in bond order calculations, and reliance on force fields inconsistent with the reference data. In this work, we revisit GEOM-Drugs and propose a corrected evaluation framework: we identify and fix issues in data preprocessing, construct chemically accurate valency tables, and introduce a GFN2-xTB-based geometry and energy benchmark. We retrain and re-evaluate several leading models under this framework, providing updated performance metrics and practical recommendations for future benchmarking. Our results underscore the need for chemically rigorous evaluation practices in 3D molecular generation. Our recommended evaluation methods and GEOM-Drugs processing scripts are available at https://github.com/isayevlab/geom-drugs-3dgen-evaluation.
Problem

Research questions and friction points this paper is trying to address.

Fixes flawed evaluation protocols in 3D molecular generation benchmarks
Corrects valency definitions and bond order calculations in GEOM-Drugs
Introduces chemically accurate geometry and energy benchmarks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Corrected data preprocessing and valency tables
Introduced GFN2-xTB-based geometry benchmark
Retrained models with updated evaluation framework
🔎 Similar Papers
No similar papers found.
Filipp Nikitin
Filipp Nikitin
CMU
Machine LearningComputational Drug DiscoveryDeep learning
I
Ian Dunn
Department of Computational and Systems Biology, University of Pittsburgh, Pittsburgh, PA, USA
D
David Ryan Koes
Department of Computational and Systems Biology, University of Pittsburgh, Pittsburgh, PA, USA
Olexandr Isayev
Olexandr Isayev
Carl and Amy Jones Professor of Chemistry, Carnegie Mellon University
computational chemistryAI for sciencedrug discoverymaterials informatics