🤖 AI Summary
This work addresses the problem of efficiently and accurately estimating Riesz representers directly to support semiparametric inference and doubly robust estimation. By comparing two classes of optimization approaches—automatic debiased machine learning and sieve conditional moment models—the study reveals their numerical equivalence under no regularization or ridge regularization, but identifies discrepancies when non-smooth regularizers such as Lasso or neural networks are employed. Building on this insight, the paper proposes a unified constrained optimization framework that seamlessly integrates linear models, sieve methods, reproducing kernel Hilbert spaces (RKHS), and modern machine learning techniques. This framework offers a theoretically grounded yet computationally feasible pathway for high-dimensional semiparametric estimation, advancing both methodological rigor and practical applicability in complex settings.
📝 Abstract
The Riesz representer is a central object in semiparametric statistics and debiased/doubly-robust estimation. Two literatures in econometrics have highlighted the role for directly estimating Riesz representers: the automatic debiased machine learning literature (as in Chernozhukov et al., 2022b), and an independent literature on sieve methods for conditional moment models (as in Chen et al., 2014). These two literatures solve distinct optimization problems that in the population both have the Riesz representer as their solution. We show that with unregularized or ridge-regularized linear, sieve, or RKHS models, the two resulting estimators are numerically equivalent. However, for other regularization schemes such as the Lasso, or more general machine learning function classes including neural networks, the estimators are not necessarily equivalent. In the latter case, the Chen et al. (2014) formulation yields a novel constrained optimization problem for directly estimating Riesz representers with machine learning. Drawing on results from Birrell et al. (2022), we conjecture that this approach may offer statistical advantages at the cost of greater computational complexity.