Generalized Jeffreys's approximate objective Bayes factor: Model-selection consistency, finite-sample accuracy, and statistical evidence in 71,126 clinical trial findings

📅 2025-10-11

📈 Citations: 0

✨ Influential: 0

career value

192K/year

🤖 AI Summary

To address widespread p-value misuse and misinterpretation of statistical significance, this paper proposes the extended Jeffreys Approximate Bayes factor (eJAB)—a generalized, objective Bayesian measure computable in closed form from only the p-value, sample size, and parameter dimension, providing an interpretable quantification of evidence strength. Theoretically, eJAB is proven to achieve model selection consistency under common settings including t-tests and F-tests; Monte Carlo simulations confirm its robustness in small samples. A large-scale analysis of 71,126 clinical trials from ClinicalTrials.gov reveals that 4,088 results carry Type I error risk, and 487 exhibit the Jeffreys–Lindley paradox; 75% adopt α ≥ 0.05, while 35.5% of “significant” findings yield only anecdotal evidence (eJAB < 3). eJAB offers a broadly applicable, implementation-friendly, objective Bayesian alternative to p-values.

Technology Category

Application Category

📝 Abstract

Concerns about the misuse and misinterpretation of p-values and statistical significance have motivated alternatives for quantifying evidence. We define a generalized form of Jeffreys's approximate objective Bayes factor (eJAB), a one-line calculation that is a function of the p-value, sample size, and parameter dimension. We establish conditions under which eJAB is model-selection consistent and verify them for ten statistical tests. We assess finite-sample accuracy by comparing eJAB with Markov chain Monte Carlo computed Bayes factors in 12 simulation studies. We then apply eJAB to 71,126 results from ClinicalTrials.gov (CTG) and find that the proportion of findings with $ ext{p-value} le α$ yet $eJAB_{01}>1$ (favoring the null) closely tracks the significance level $α$, suggesting that such contradictions are pointing to the type I errors. We catalog 4,088 such candidate type I errors and provide details for 131 with reported $ ext{p-value} le 0.01$. We also identify 487 instances of the Jeffreys-Lindley paradox. Finally, we estimate that 75% (6%) of clinical trial plans from CTG set $αge 0.05$ as the target evidence threshold, and that 35.5% (0.22%) of results significant at $α=0.05$ correspond to evidence that is no stronger than anecdotal under eJAB.

Problem

Research questions and friction points this paper is trying to address.

Developing a generalized Bayes factor alternative to p-values

Establishing model selection consistency for statistical tests

Assessing evidence reliability in clinical trial findings

Innovation

Methods, ideas, or system contributions that make the work stand out.

Generalized Jeffreys's approximate objective Bayes factor

One-line calculation using p-value, sample size, dimension

Model-selection consistency verified for ten statistical tests

🔎 Similar Papers

No similar papers found.