🤖 AI Summary
This work systematically investigates how input conformational quality affects the predictive performance of 3D molecular representation learning models, specifically for modeling DFT-optimized Sterimol parameters. We formulate and empirically address three key questions: (i) whether low-quality conformers can substitute for high-quality ones in predicting target properties; (ii) the role of geometric optimization fidelity in encoding random conformers; and (iii) the detrimental impact of missing bioactive conformers on model accuracy. Leveraging graph neural networks with set encoders, multi-level conformational sampling, DFT-based benchmarking, and ablation studies, we uncover— for the first time—a nontrivial dependency between conformational quality and prediction objectives. Results show that low-quality conformers approximate Sterimol predictions within <0.3 Å error, yet omitting bioactive conformers degrades accuracy by over 40%; in certain cases, inexpensive conformational ensembles yield superior estimates compared to surrogate models. We thus propose a new “co-design principle” integrating conformational sampling and modeling.
📝 Abstract
Training machine learning models to predict properties of molecular conformer ensembles is an increasingly popular strategy to accelerate the conformational analysis of drug-like small molecules, reactive organic substrates, and homogeneous catalysts. For high-throughput analyses especially, trained surrogate models can help circumvent traditional approaches to conformational analysis that rely on expensive conformer searches and geometry optimizations. Here, we question how the performance of surrogate models for predicting 3D conformer-dependent properties (of a single, active conformer) is affected by the quality of the 3D conformers used as their input. How well do lower-quality conformers inform the prediction of properties of higher-quality conformers? Does the fidelity of geometry optimization matter when encoding random conformers? For models that encode sets of conformers, how does the presence of the active conformer that induces the target property affect model accuracy? How do predictions from a surrogate model compare to estimating the properties from cheap ensembles themselves? We explore these questions in the context of predicting Sterimol parameters of conformer ensembles optimized with density functional theory. Although answers will be case-specific, our analyses provide a valuable perspective on 3D representation learning models and raise practical considerations regarding when conformer quality matters.