An index of effective number of variables for uncertainty and reliability analysis in model selection problems

📅 2026-02-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the challenge of determining the optimal number of effective variables—such as polynomial order, cluster count, or feature subset size—in nested model selection. To this end, the authors propose a novel metric termed the Effective Number of Variables (ENV), grounded in the principle of maximizing the area under the ROC curve (AUC). This approach overcomes the subjectivity inherent in traditional elbow methods by providing a statistically interpretable and confidence-assessable measure of model complexity, while remaining compatible with established information criteria such as AIC and BIC. Extensive experiments on multiple real-world datasets demonstrate that ENV consistently outperforms both classical and state-of-the-art methods in terms of accuracy and robustness. The implementation is publicly available as open-source MATLAB code.

Technology Category

Application Category

📝 Abstract
An index of an effective number of variables (ENV) is introduced for model selection in nested models. This is the case, for instance, when we have to decide the order of a polynomial function or the number of bases in a nonlinear regression, choose the number of clusters in a clustering problem, or the number of features in a variable selection application (to name few examples). It is inspired by the idea of the maximum area under the curve (AUC). The interpretation of the ENV index is identical to the effective sample size (ESS) indices concerning a set of samples. The ENV index improves {drawbacks of} the elbow detectors described in the literature and introduces different confidence measures of the proposed solution. These novel measures can be also employed jointly with the use of different information criteria, such as the well-known AIC and BIC, or any other model selection procedures. Comparisons with classical and recent schemes are provided in different experiments involving real datasets. Related Matlab code is given.
Problem

Research questions and friction points this paper is trying to address.

model selection
effective number of variables
uncertainty analysis
reliability analysis
nested models
Innovation

Methods, ideas, or system contributions that make the work stand out.

effective number of variables
model selection
uncertainty analysis
elbow detection
information criteria
🔎 Similar Papers
No similar papers found.
Luca Martino
Luca Martino
Associate Professor - University of Catania
Bayesian inferencecomputational methods (MCMCparticle filtersexact sampling etc.. )
E
Eduardo Morgado
Universidad Rey Juan Carlos, Campus de Fuenlabrada, Madrid
R
Roberto San Millán-Castillo
Universidad Rey Juan Carlos, Campus de Fuenlabrada, Madrid