🤖 AI Summary
This study addresses the limitation of the traditional Tukey boxplot, which tends to over-identify outliers in large samples, thereby compromising detection reliability. To mitigate this issue, the authors propose two R packages—ChauBoxplot and AdaptiveBoxplot—that dynamically adjust outlier detection criteria according to sample size. ChauBoxplot employs an improved interquartile range estimator, while AdaptiveBoxplot implements an adaptive thresholding mechanism. Extensive simulation experiments demonstrate that both methods substantially reduce false positive rates in large samples without sacrificing interpretability or statistical robustness. The paper further provides practical guidance for selecting between the two approaches, offering data analysts more accurate and stable tools for outlier detection in real-world applications.
📝 Abstract
Tukey's boxplot is widely used for outlier detection; however, its classic fixed-fence rule tends to flag an excessive number of outliers as the sample size grows. To address this limitation, we introduce two new R packages, ChauBoxplot and AdaptiveBoxplot, which implement more robust methods for outlier detection. We also provide practical guidance, drawn from simulation results, to help practitioners choose suitable boxplot methods and balance interpretability with statistical reliability.