🤖 AI Summary
This work addresses the challenge of modeling the temperature dependence of infinite dilution diffusion coefficients, which traditional matrix completion methods struggle to capture due to their reliance on large volumes of high-quality experimental data. The study introduces tensor completion into this domain for the first time, proposing a hybrid Tucker decomposition approach that integrates a Bayesian framework with prior knowledge from the semi-empirical SEGWE model. To enhance data efficiency, the method incorporates active learning to guide PFG-NMR experiments for targeted data acquisition. Evaluated across 19 solute–solvent systems, the model demonstrates significantly improved prediction accuracy at 298 K, 313 K, and 333 K compared to existing methods and successfully extrapolates across a broad temperature range of 268–378 K, exhibiting both high data efficiency and strong generalization capability.
📝 Abstract
Predicting diffusion coefficients in mixtures is crucial for many applications, as experimental data remain scarce, and machine learning (ML) offers promising alternatives to established semi-empirical models. Among ML models, matrix completion methods (MCMs) have proven effective in predicting thermophysical properties, including diffusion coefficients in binary mixtures. However, MCMs are restricted to single-temperature predictions, and their accuracy depends strongly on the availability of high-quality experimental data for each temperature of interest. In this work, we address this challenge by presenting a hybrid tensor completion method (TCM) for predicting temperature-dependent diffusion coefficients at infinite dilution in binary mixtures. The TCM employs a Tucker decomposition and is jointly trained on experimental data for diffusion coefficients at infinite dilution in binary systems at 298 K, 313 K, and 333 K. Predictions from the semi-empirical SEGWE model serve as prior knowledge within a Bayesian training framework. The TCM then extrapolates linearly to any temperature between 268 K and 378 K, achieving markedly improved prediction accuracy compared to established models across all studied temperatures. To further enhance predictive performance, the experimental database was expanded using active learning (AL) strategies for targeted acquisition of new diffusion data by pulsed-field gradient (PFG) NMR measurements. Diffusion coefficients at infinite dilution in 19 solute + solvent systems were measured at 298 K, 313 K, and 333 K. Incorporating these results yields a substantial improvement in the TCM's predictive accuracy. These findings highlight the potential of combining data-efficient ML methods with adaptive experimentation to advance predictive modeling of transport properties.