π€ AI Summary
This work addresses the geometric mismatch between Bellman residual minimization and the infinity-norm contraction property of the Bellman operator in function approximation settings. To reconcile this discrepancy, the paper proposes a soft Bellman residual minimization method based on weighted \(L^p\) norms. By increasing the value of \(p\), the optimization objective asymptotically aligns with the contraction geometry of the Bellman operator, thereby effectively controlling error propagation while preserving the feasibility of gradient-based optimization. Theoretical analysis establishes a formal connection between residual minimization and contractive mappings, yielding a performance bound that explicitly depends on \(p\). This bound demonstrates that larger \(p\) values enhance alignment with the operatorβs contraction characteristics, achieving a unified balance between geometric consistency and optimization compatibility.
π Abstract
The problem of solving Markov decision processes under function approximation remains a fundamental challenge, even under linear function approximation settings. A key difficulty arises from a geometric mismatch: while the Bellman optimality operator is contractive in the Linfty-norm, commonly used objectives such as projected value iteration and Bellman residual minimization rely on L2-based formulations. To enable gradient-based optimization, we consider a soft formulation of Bellman residual minimization and extend it to a generalized weighted Lp -norm. We show that this formulation aligns the optimization objective with the contraction geometry of the Bellman operator as p increases, and derive corresponding performance error bounds. Our analysis provides a principled connection between residual minimization and Bellman contraction, leading to improved control of error propagation while remaining compatible with gradient-based optimization.