Extension of the Dip-test Repertoire - Efficient and Differentiable p-value Calculation for Clustering

📅 2023-12-19

🏛️ SDM

📈 Citations: 1

✨ Influential: 0

career value

197K/year

🤖 AI Summary

In univariate mode testing, the Dip-test’s p-value computation has long relied on inefficient, non-differentiable, and scale-limited lookup tables, hindering integration into gradient-based learning frameworks. Method: This paper proposes the first analytical, differentiable Dip-p-value mapping method. Its core is a sample-size-adaptive, differentiable sigmoid function, trained via numerical fitting to enable continuous, rapid, end-to-end differentiable transformation from the Dip statistic to p-values. Contribution/Results: The approach overcomes the discretization and sample-size constraints of conventional bootstrap-based methods, enabling gradient propagation and joint optimization. Integrated into a subspace clustering framework (Dip’n’Sub), it significantly improves clustering accuracy and training stability across multiple benchmark datasets. Moreover, it accelerates p-value computation by over an order of magnitude and generalizes seamlessly to arbitrary sample sizes.

📝 Abstract

Over the last decade, the Dip-test of unimodality has gained increasing interest in the data mining community as it is a parameter-free statistical test that reliably rates the modality in one-dimensional samples. It returns a so called Dip-value and a corresponding probability for the sample's unimodality (Dip-p-value). These two values share a sigmoidal relationship. However, the specific transformation is dependent on the sample size. Many Dip-based clustering algorithms use bootstrapped look-up tables translating Dip- to Dip-p-values for a certain limited amount of sample sizes. We propose a specifically designed sigmoid function as a substitute for these state-of-the-art look-up tables. This accelerates computation and provides an approximation of the Dip- to Dip-p-value transformation for every single sample size. Further, it is differentiable and can therefore easily be integrated in learning schemes using gradient descent. We showcase this by exploiting our function in a novel subspace clustering algorithm called Dip'n'Sub. We highlight in extensive experiments the various benefits of our proposal.

Problem

Research questions and friction points this paper is trying to address.

Efficient p-value calculation for Dip-test clustering

Differentiable sigmoid function replaces bootstrapped look-up tables

Enables gradient-based learning in modality-based clustering

Innovation

Methods, ideas, or system contributions that make the work stand out.

Designed sigmoid function replaces look-up tables

Accelerates computation for all sample sizes

Differentiable for gradient descent integration

🔎 Similar Papers

No similar papers found.