🤖 AI Summary
This study addresses the lack of theoretical justification for principal component analysis (PCA) in high-dimensional factor models when the number of factors is overestimated. It investigates the asymptotic behavior when the true factor number \( r \) is conservatively set to any fixed \( R \geq r \). Leveraging the anisotropic local law from random matrix theory, the paper characterizes the noise-dominated nature, incoherence, and near-orthogonality to true factor loadings of the spurious components. Consistency of factor estimation is established via two rotation mappings, providing the first rigorous theoretical support for the common practice of using a conservative upper bound on the number of factors. The results show that consistent factor estimates are attainable for any fixed \( R \geq r \), enabling \( \sqrt{T} \)-consistent and asymptotically normal inference on treatment effects in factor-augmented regressions.
📝 Abstract
We develop asymptotic theory for principal component analysis (PCA) of a high-dimensional factor model in which the working dimension $R$ is fixed and only required to satisfy $R \ge r$, where $r$ is the true number of factors. Building on anisotropic local laws from random matrix theory, we show that the ``extra'' empirical eigencomponents beyond the $r$-th are asymptotically noise-governed, incoherent, and nearly orthogonal to the factor loadings. We introduce two rotations, an expanded $r\times R$ map $H'$ and a compressed $R\times r$ map $H^{+}$, and establish consistency of the estimated factors under both. As an application, we analyze a factor-augmented regression for treatment-effect inference and prove $\sqrt{T}$-asymptotic normality for every fixed $R \ge r$. These results provide a theoretical underpinning for the common empirical practice of adopting a conservative upper bound on the number of factors, and shift the analytical burden from consistent dimension selection to the milder requirement of bounding $r$ from above.