A Systematic Evaluation of Molecular Mixture Behavior Prediction

📅 2026-05-28

📈 Citations: 0

✨ Influential: 0

career value

175K/year

🤖 AI Summary

Existing methods for molecular property prediction primarily focus on pure compounds and struggle to accurately model non-ideal interactions in mixtures. Moreover, evaluation based solely on absolute error often obscures model deficiencies in capturing such effects. This work proposes a systematic evaluation framework that decomposes prediction error into contributions from pure components and non-ideal interactions. It introduces leakage-proof data splits, an ideal-mixture baseline, and excess-property metrics, and constructs seven paired pure-substance–mixture datasets to enable reproducible benchmarking. Experiments reveal a significant drop in model generalization to unseen molecules, highlighting molecular transfer as a core challenge and advocating a shift from single-accuracy assessment toward multidimensional evaluation of mixture property prediction capabilities.

📝 Abstract

Machine learning for molecular property prediction has focused largely on pure compounds, even though many practical applications depend on mixtures with intermolecular interactions. Recent work has expanded the availability of mixture datasets, but evaluation still focuses mainly on absolute accuracy. However, absolute errors in mixtures conflate pure-component contributions with deviations from ideal mixing. We propose an evaluation framework that decomposes mixture-property error into pure-compound and interaction (non-ideal) components. The framework combines leakage-aware split protocols, ideal-mixture baselines, and excess-property metrics. To support reproducible benchmarking, we curate seven matched pure and mixture physicochemical property datasets. Across multiple mixture-property tasks and model families, we find that strong absolute accuracy can mask poor recovery of non-ideal mixture behavior, and that performance drops substantially under strict molecule splits. These results identify transfer to unseen molecules as a central challenge in molecular mixture machine learning and motivate evaluation beyond absolute accuracy alone.

Problem

Research questions and friction points this paper is trying to address.

molecular mixtures

non-ideal behavior

machine learning

property prediction

transferability

Innovation

Methods, ideas, or system contributions that make the work stand out.

mixture property prediction

excess property

non-ideal mixing