๐ค AI Summary
This study addresses the challenge of quantifying the sensitivity of classification models to feature errors in training data by proposing the Error Sensitivity Profile (ESP)โa novel metric that systematically defines and measures the impact of single or multiple feature errors on model performance. Through experiments on two widely used datasets involving 14 classification models and leveraging a custom-developed dirty data toolkit, the authors demonstrate that model performance degradation is not necessarily correlated with simple featureโtarget variable associations. ESP effectively identifies the error types and critical features that most significantly impair predictive accuracy, thereby offering actionable guidance for prioritizing data cleaning efforts.
๐ Abstract
The quality of training data is critical to the performance of machine learning models. In this paper, the Error Sensitivity Profile (ESP) is proposed. It quantifies the sensitivity of model performance to errors in a single feature or in multiple features. By leveraging ESP, data-cleaning efforts can be prioritized based on error types and features most likely to affect model performance. To support the computation of this metric, an integrated suite of tools, called \dirty, is created. We conduct an extensive experimental study on two widely used datasets using 14 classification models, revealing that performance degradation is not always predictable from simple correlations with the target variable.