🤖 AI Summary
This work addresses the challenge of constructing high-accuracy surrogate models in scenarios where high-fidelity data are scarce. We propose a novel multi-fidelity Gaussian process regression method that innovatively embeds low-fidelity data as augmented features into an expanded input space, thereby synergistically combining the strengths of co-kriging and autoregressive modeling. The approach achieves a balanced trade-off between modeling accuracy and computational efficiency within a unified framework, effectively leveraging heterogeneous multi-source data without requiring additional assumptions. Experimental results across multiple benchmark problems demonstrate that the proposed method significantly outperforms existing techniques, delivering higher predictive accuracy at lower computational cost.
📝 Abstract
Supervised machine learning describes the practice of fitting a parameterized model to labeled input-output data. Supervised machine learning methods have demonstrated promise in learning efficient surrogate models that can (partially) replace expensive high-fidelity models, making many-query analyses, such as optimization, uncertainty quantification, and inference, tractable. However, when training data must be obtained through the evaluation of an expensive model or experiment, the amount of training data that can be obtained is often limited, which can make learned surrogate models unreliable. However, in many engineering and scientific settings, cheaper \emph{low-fidelity} models may be available, for example arising from simplified physics modeling or coarse grids. These models may be used to generate additional low-fidelity training data. The goal of \emph{multifidelity} machine learning is to use both high- and low-fidelity training data to learn a surrogate model which is cheaper to evaluate than the high-fidelity model, but more accurate than any available low-fidelity model. This work proposes a new multifidelity training approach for Gaussian process regression which uses low-fidelity data to define additional features that augment the input space of the learned model. The approach unites desirable properties from two separate classes of existing multifidelity GPR approaches, cokriging and autoregressive estimators. Numerical experiments on several test problems demonstrate both increased predictive accuracy and reduced computational cost relative to the state of the art.