🤖 AI Summary
This study addresses the bias in inference on quadratic forms of linear regression coefficients under high-dimensional covariates in clustered data—arising in contexts such as instrumental variable regression, variance component estimation, and testing multiple constraints. The authors propose an unbiased estimator based on leave-one-cluster-out (LOCO) cross-fitting and establish its asymptotic normality. They further develop novel leave-two- and leave-three-cluster variance estimators that ensure conservativeness under weaker conditions while maintaining computational efficiency. The proposed framework accommodates growing cluster sizes and covariate dimensions with sample size, making it suitable for settings with strong within-cluster dependence and high-dimensional covariates. Both theoretically and computationally, the method outperforms conventional plug-in estimators, delivering robust, consistent, and efficient cluster-robust inference.
📝 Abstract
This paper studies inference for quadratic forms of linear regression coefficients with clustered data and many covariates. Our framework covers three important special cases: instrumental variables regression with many instruments and controls, inference on variance components, and testing multiple restrictions in a linear regression. Na\"{\i}ve plug-in estimators are known to be biased. We study a leave-one-cluster-out estimator that is unbiased, and provide sufficient conditions for its asymptotic normality. For inference, we establish the consistency of a leave-three-cluster-out variance estimator under primitive conditions. In addition, we develop a novel leave-two-cluster-out variance estimator that is computationally simpler and guaranteed to be conservative under weaker conditions. Our analysis allows cluster sizes to diverge with the sample size, accommodates strong within-cluster dependence, and permits the dimension of the covariates to diverge with the sample size, potentially at the same rate.