๐ค AI Summary
Existing methods for molecular property prediction primarily focus on pure compounds and struggle to accurately model non-ideal interactions in mixtures. Moreover, evaluation based solely on absolute error often obscures model deficiencies in capturing such effects. This work proposes a systematic evaluation framework that decomposes prediction error into contributions from pure components and non-ideal interactions. It introduces leakage-proof data splits, an ideal-mixture baseline, and excess-property metrics, and constructs seven paired pure-substanceโmixture datasets to enable reproducible benchmarking. Experiments reveal a significant drop in model generalization to unseen molecules, highlighting molecular transfer as a core challenge and advocating a shift from single-accuracy assessment toward multidimensional evaluation of mixture property prediction capabilities.
๐ Abstract
Machine learning for molecular property prediction has focused largely on pure compounds, even though many practical applications depend on mixtures with intermolecular interactions. Recent work has expanded the availability of mixture datasets, but evaluation still focuses mainly on absolute accuracy. However, absolute errors in mixtures conflate pure-component contributions with deviations from ideal mixing. We propose an evaluation framework that decomposes mixture-property error into pure-compound and interaction (non-ideal) components. The framework combines leakage-aware split protocols, ideal-mixture baselines, and excess-property metrics. To support reproducible benchmarking, we curate seven matched pure and mixture physicochemical property datasets. Across multiple mixture-property tasks and model families, we find that strong absolute accuracy can mask poor recovery of non-ideal mixture behavior, and that performance drops substantially under strict molecule splits. These results identify transfer to unseen molecules as a central challenge in molecular mixture machine learning and motivate evaluation beyond absolute accuracy alone.