A unified weighting framework for evaluating nearest neighbour classification

📅 2023-11-28
🏛️ arXiv.org
📈 Citations: 2
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the lack of a unified theoretical foundation for neighbor weighting schemes in classical nearest-neighbor (NN), fuzzy nearest-neighbor (FNN), and fuzzy-rough nearest-neighbor (FRNN) classifiers. We propose the first kernel-based unified weighting framework. Through systematic evaluation across 85 benchmark datasets, we reveal the kernel nature of Samworth’s optimal weights, introduce a novel kernel—Yager-1/2—derived from Yager’s negation operator, and discover the universal optimality of the Boscovich distance. Results show that FRNN achieves the best overall performance, while NN ranks second and can be efficiently approximated using simplified Yager weights. For NN and FRNN, the Boscovich distance combined with MAD/STD normalization yields optimal configurations; for FNN, the optimal solution requires only Samworth weights with ℓ₁/ℓ₂ normalization. This study unifies the mechanistic understanding of weighting schemes, empirically grounds distance metric selection, and enables algorithmic configuration simplification.
📝 Abstract
We present the first comprehensive and large-scale evaluation of classical (NN), fuzzy (FNN) and fuzzy rough (FRNN) nearest neighbour classification. We standardise existing proposals for nearest neighbour weighting with kernel functions, applied to the distance values and/or ranks of the nearest neighbours of a test instance. In particular, we show that the theoretically optimal Samworth weights converge to a kernel. Kernel functions are closely related to fuzzy negation operators, and we propose a new kernel based on Yager negation. We also consider various distance and scaling measures, which we show can be related to each other. Through a systematic series of experiments on 85 real-life classification datasets, we find that NN, FNN and FRNN all perform best with Boscovich distance, and that NN and FRNN perform best with a combination of Samworth rank- and distance-weights and scaling by the mean absolute deviation around the median ($r_1$), the standard deviation ($r_2$) or the semi-interquartile range ($r_{infty}^*$), while FNN performs best with only Samworth distance-weights and $r_1$- or $r_2$-scaling. However, NN achieves comparable performance with Yager-$frac{1}{2}$ distance-weights, which are simpler to implement than a combination of Samworth distance- and rank-weights. Finally, FRNN generally outperforms NN, which in turn performs systematically better than FNN.
Problem

Research questions and friction points this paper is trying to address.

Evaluates NN, FNN, FRNN classification with unified weighting
Standardizes kernel-based weighting for distance and rank
Compares performance across 85 datasets and metrics
Innovation

Methods, ideas, or system contributions that make the work stand out.

Standardized kernel-based weighting for nearest neighbors
Proposed Yager negation-based kernel function
Optimal Boscovich distance for NN, FNN, FRNN
🔎 Similar Papers
No similar papers found.
O
O. Lenz
Research Group for Computational Web Intelligence, Department of Applied Mathematics, Computer Science and Statistics, Ghent University
H
Henri Bollaert
Research Group for Computational Web Intelligence, Department of Applied Mathematics, Computer Science and Statistics, Ghent University
Chris Cornelis
Chris Cornelis
Associate professor, Ghent University
Artificial intelligencemachine learningfuzzy setsrough setsrecommender systems