A test statistic, $h^*$, for outlier analysis

📅 2025-08-08

📈 Citations: 0

✨ Influential: 0

career value

232K/year

🤖 AI Summary

Traditional outlier detection methods (e.g., Grubb’s test, Dixon’s Q) rely on normality assumptions, limiting their ability to assess the substantive significance of outliers in non-normal real-world data. To address this, we propose *h**, a novel distribution-free parametric statistic for global outlier identification and significance quantification. Grounded in the frequentist framework, *h** incorporates a weighted discrepancy measure and an adjustable sensitivity index, enabling Bayesian extensions and paired comparisons. Crucially, it reframes outliers as interpretable phenomena rather than mere noise to be discarded. Evaluated on empirical emotion-intervention data, *h** demonstrably outperforms Grubb’s and Dixon’s methods: it robustly distinguishes genuinely exceptional deviations from spurious extremes, achieving both statistical rigor and phenomenological interpretability.

Technology Category

Application Category

📝 Abstract

Outlier analysis is a critical tool across diverse domains, from clinical decision-making to cybersecurity and talent identification. Traditional statistical outlier detection methods, such as Grubb's test and Dixon's Q, are predicated on the assumption of normality and often fail to reckon the meaningfulness of exceptional values within non-normal datasets. In this paper, we introduce the h* statistic, a novel parametric, frequentist approach for evaluating global outliers without the normality assumption. Unlike conventional techniques that primarily remove outliers to preserve statistical `integrity,' h* assesses the distinctiveness as phenomena worthy of investigation by quantifying a data point's extremity relative to its group as a measure of statistical significance analogous to the role of Student's t in comparing means. We detail the mathematical formulation of h* with tabulated confidence intervals of significance levels and extensions to Bayesian inference and paired analysis. The capacity of h* to discern between stable extraordinary deviations and values that merely appear extreme under conventional criteria is demonstrated using empirical data from a mood intervention study. A generalisation of h* is subsequently proposed, with individual weights assigned to differences for nuanced contextual description, and a variable sensitivity exponent for objective inference optimisation and subjective inference specification. The physical significance of an h*-recognised outlier is linked to the signature of unique occurrences. Our findings suggest that h* offers a robust alternative for outlier evaluation, enriching the analytical repertoire for researchers and practitioners by foregrounding the interpretative value of outliers within complex, real-world datasets. This paper is also a statement against the dominance of normality in celebration of the luminary and the lunatic alike.

Problem

Research questions and friction points this paper is trying to address.

Develops h* statistic for outlier analysis without normality assumption

Evaluates meaningfulness of outliers in non-normal datasets

Provides robust alternative to traditional outlier detection methods

Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces h* statistic for outlier analysis

No normality assumption required

Extends to Bayesian inference and paired analysis

🔎 Similar Papers

No similar papers found.