🤖 AI Summary
Current 3D molecular generation research widely relies on the GEOM-Drugs dataset for evaluation, yet its preprocessing suffers from severe chemical inaccuracies—including erroneous valence bond assignments, miscalculated bond orders, and inconsistent classical force-field parameterization misaligned with reference structures—compromising chemical validity of evaluation metrics.
Method: We systematically diagnose and rectify these flaws by introducing a “chemical consistency–first” evaluation framework: (i) reconstructing valence rules per IUPAC standards; (ii) replacing empirical force fields with GFN2-xTB quantum-chemical geometry optimization and energy computation to establish chemically rigorous ground-truth benchmarks; and (iii) implementing a molecular topology validation and graph-rule modeling pipeline.
Contribution/Results: Re-evaluating state-of-the-art generative models under this framework reveals substantial overestimation of prior performance metrics. We publicly release corrected data protocols and evaluation scripts to foster community-wide adoption of chemically sound 3D molecular generation assessment standards.
📝 Abstract
Deep generative models have shown significant promise in generating valid 3D molecular structures, with the GEOM-Drugs dataset serving as a key benchmark. However, current evaluation protocols suffer from critical flaws, including incorrect valency definitions, bugs in bond order calculations, and reliance on force fields inconsistent with the reference data. In this work, we revisit GEOM-Drugs and propose a corrected evaluation framework: we identify and fix issues in data preprocessing, construct chemically accurate valency tables, and introduce a GFN2-xTB-based geometry and energy benchmark. We retrain and re-evaluate several leading models under this framework, providing updated performance metrics and practical recommendations for future benchmarking. Our results underscore the need for chemically rigorous evaluation practices in 3D molecular generation. Our recommended evaluation methods and GEOM-Drugs processing scripts are available at https://github.com/isayevlab/geom-drugs-3dgen-evaluation.