🤖 AI Summary
This work addresses the severe distortion of conventional mean and covariance estimates caused by outliers in interval-valued data. To this end, it introduces the first extension of the Minimum Covariance Determinant (MCD) estimator to the interval-valued setting, integrating robust location and scale estimates based on the Mallows distance to construct a robust interval Mahalanobis distance. An adaptive thresholding mechanism is further incorporated to enable effective outlier detection. The proposed method demonstrates superior performance over classical approaches across varying contamination levels, achieving notably higher accuracy in both covariance estimation and outlier identification. Its efficacy is validated through experiments on real-world datasets, confirming its practical applicability and robustness.
📝 Abstract
Interval-valued data are one of the most common symbolic data types, which enables the preservation of the underlying variability of the data. The interval mean and covariance matrix can be estimated using the barycenter approach based on the Mallows distance. However, as for conventional data, classical estimates can be significantly affected by anomalous data points, frequently present in real-life datasets. To address this problem, we develop a robust alternative which estimates location and scale by extending the Minimum Covariance Determinant estimator to interval-valued data. The algorithm yields a robust Interval-Mahalanobis distance, which can be used to detect anomalous observations based on adaptive cutoff values. Through extensive simulation studies across various contamination levels, we demonstrate that the interval-valued robust estimator consistently outperforms classical methods in covariance matrix estimation and achieves superior outlier detection accuracy. Finally, the applicability and effectiveness of the proposed method are illustrated through real-world datasets.