🤖 AI Summary
This paper studies nonparametric instrumental variable regression with observed covariates (NPIV-O) for identifying heterogeneous causal effects. The core challenges are: (i) observed covariates induce a partial-identity structure that invalidates standard NPIV theory, and (ii) the structural function exhibits anisotropic smoothness. To address these, we develop a novel theoretical framework based on Fourier-based partial smoothness measures, establishing—for the first time—the $L^2$ minimax lower bound for NPIV-O. We show that the gap between upper and lower bounds arises from the adaptive tuning of kernel bandwidths in the projection risk. Accordingly, we propose KIV-O: an extension of kernelized two-stage least squares employing adaptive Gaussian kernels to accommodate anisotropic smoothness, achieving the optimal $L^2$ convergence rate. This rate interpolates between the known minimax rates for standard NPIV and nonparametric regression, and our theory extends to proximal causal inference.
📝 Abstract
We study the problem of nonparametric instrumental variable regression with observed covariates, which we refer to as NPIV-O. Compared with standard nonparametric instrumental variable regression (NPIV), the additional observed covariates facilitate causal identification and enables heterogeneous causal effect estimation. However, the presence of observed covariates introduces two challenges for its theoretical analysis. First, it induces a partial identity structure, which renders previous NPIV analyses - based on measures of ill-posedness, stability conditions, or link conditions - inapplicable. Second, it imposes anisotropic smoothness on the structural function. To address the first challenge, we introduce a novel Fourier measure of partial smoothing; for the second challenge, we extend the existing kernel 2SLS instrumental variable algorithm with observed covariates, termed KIV-O, to incorporate Gaussian kernel lengthscales adaptive to the anisotropic smoothness. We prove upper $L^2$-learning rates for KIV-O and the first $L^2$-minimax lower learning rates for NPIV-O. Both rates interpolate between known optimal rates of NPIV and nonparametric regression (NPR). Interestingly, we identify a gap between our upper and lower bounds, which arises from the choice of kernel lengthscales tuned to minimize a projected risk. Our theoretical analysis also applies to proximal causal inference, an emerging framework for causal effect estimation that shares the same conditional moment restriction as NPIV-O.