Proper Correlation Coefficients for Nominal Random Variables

📅 2025-05-01

📈 Citations: 0

✨ Influential: 0

career value

182K/year

🤖 AI Summary

This paper addresses key limitations of existing dependence measures—such as the contingency coefficient, lambda, tau, and the uncertainty coefficient—including the lack of an intuitive definition of “perfect dependence” for nominal variables, inability to attain unity for mixed-type variables (especially those involving continuous components), sensitivity to marginal discretization, and systematic underestimation of strong associations. To resolve these issues, we propose a novel dependence framework grounded in conditional distribution reconstruction and probability kernels. We formally define perfect dependence for settings involving at least one nominal variable and construct a family of correlation coefficients ranging in [0,1], attaining 1 if and only if perfect dependence holds. Theoretically, we establish scale robustness and full attainability; derive the asymptotic distribution of estimators to enable statistical inference; validate finite-sample performance via simulations; and empirically uncover substantially stronger dependencies—previously underestimated by conventional methods—between country–income and religion–social variables.

Technology Category

Application Category

📝 Abstract

This paper develops an intuitive concept of perfect dependence between two variables of which at least one has a nominal scale that is attainable for all marginal distributions and proposes a set of dependence measures that are 1 if and only if this perfect dependence is satisfied. The advantages of these dependence measures relative to classical dependence measures like contingency coefficients, Goodman-Kruskal's lambda and tau and the so-called uncertainty coefficient are twofold. Firstly, they are defined if one of the variables is real-valued and exhibits continuities. Secondly, they satisfy the property of attainability. That is, they can take all values in the interval [0,1] irrespective of the marginals involved. Both properties are not shared by the classical dependence measures which need two discrete marginal distributions and can in some situations yield values close to 0 even though the dependence is strong or even perfect. Additionally, I provide a consistent estimator for one of the new dependence measures together with its asymptotic distribution under independence as well as in the general case. This allows to construct confidence intervals and an independence test, whose finite sample performance I subsequently examine in a simulation study. Finally, I illustrate the use of the new dependence measure in two applications on the dependence between the variables country and income or country and religion, respectively.

Problem

Research questions and friction points this paper is trying to address.

Develops dependence measures for nominal variables with attainable perfect dependence

Proposes measures valid for real-valued variables and continuous cases

Provides consistent estimator and asymptotic distribution for new dependence measures

Innovation

Methods, ideas, or system contributions that make the work stand out.

Develops perfect dependence measures for nominal variables

Ensures attainability and works with continuous variables

Provides consistent estimator and asymptotic distribution

🔎 Similar Papers

No similar papers found.