🤖 AI Summary
Deep learning models often encounter missing modalities during inference, and existing approaches—such as imputation, explicit marginalization, or multi-model ensembles—suffer from high computational overhead, degraded accuracy, or reliance on prior knowledge of missingness patterns. To address this, we propose Knockout training: a novel paradigm that achieves implicit marginalization by randomly zeroing out input features during training, enabling a single end-to-end neural network to jointly model both the conditional distribution under full input and the marginal distributions over all possible feature subsets. Theoretically grounded, Knockout requires no assumptions about missingness mechanisms, incurs negligible overhead during both training and inference, and eliminates the need for auxiliary models or post-hoc calibration. Extensive experiments on multiple synthetic and real-world multimodal benchmarks demonstrate that Knockout consistently outperforms state-of-the-art baselines—achieving higher predictive accuracy, better uncertainty calibration, and substantially lower deployment cost.
📝 Abstract
Deep learning models can extract predictive and actionable information from complex inputs. The richer the inputs, the better these models usually perform. However, models that leverage rich inputs (e.g., multi-modality) can be difficult to deploy widely, because some inputs may be missing at inference. Current popular solutions to this problem include marginalization, imputation, and training multiple models. Marginalization can obtain calibrated predictions but it is computationally costly and therefore only feasible for low dimensional inputs. Imputation may result in inaccurate predictions because it employs point estimates for missing variables and does not work well for high dimensional inputs (e.g., images). Training multiple models whereby each model takes different subsets of inputs can work well but requires knowing missing input patterns in advance. Furthermore, training and retaining multiple models can be costly. We propose an efficient way to learn both the conditional distribution using full inputs and the marginal distributions. Our method, Knockout, randomly replaces input features with appropriate placeholder values during training. We provide a theoretical justification of Knockout and show that it can be viewed as an implicit marginalization strategy. We evaluate Knockout in a wide range of simulations and real-world datasets and show that it can offer strong empirical performance.