🤖 AI Summary
In epidemiology, conventional analyses of continuously measured exposure variables that are artificially categorized and subject to measurement error yield severely biased risk estimates and distorted confidence intervals. Existing methods either ignore model misspecification induced by categorization or rely on strong parametric distributional assumptions. This paper introduces, for the first time, a simulation-free, distribution-free extrapolation-based correction method that simultaneously addresses both categorization-induced model misspecification and exposure measurement error. The approach integrates regression modeling with explicit estimation of the error structure and is compatible with diverse regression frameworks. Numerical experiments demonstrate its ability to achieve unbiased estimation and nominal coverage probabilities for confidence intervals. Empirical analysis reveals that uncorrected analyses underestimate the adverse effects of high-fat intake on BMI and obesity risk by 30% and 60%, respectively. By obviating restrictive assumptions and accommodating realistic data structures, the proposed method substantially enhances the robustness and reliability of causal inference in epidemiological studies.
📝 Abstract
In epidemiology studies, it is often of interest to consider a misspecified model that categorizes continuous variables, such as calorie and nutrient intake, to analyze disease risk and achieve better model interpretation. When the original continuous variable is contaminated with measurement errors, ignoring this issue and performing regular statistical analysis leads to severely biased point estimators and invalid confidence intervals. Though errors-in-variables is a well-known critical issue in many areas, most existing methods addressing measurement errors either do not consider model misspecification or have strong parametric distributional assumptions. To this end, we propose a flexible simulation-free extrapolation method, which provides valid and robust statistical inference under various models and has no distributional assumptions on the observed data. Through extensive numerical studies, we demonstrate that the proposed method can provide unbiased point estimation and valid confidence intervals under various regression models. Through the analysis of the Food Frequency Questionnaire in UK Biobank data, we show that ignoring measurement errors underestimates the impact of high fat intake on BMI and obesity by at least 30% and 60%, respectively, compared to the results of correcting measurement errors by the proposed method.