Contraction-Aligned Analysis of Soft Bellman Residual Minimization with Weighted Lp-Norm for Markov Decision Problem

πŸ“… 2026-04-08
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work addresses the geometric mismatch between Bellman residual minimization and the infinity-norm contraction property of the Bellman operator in function approximation settings. To reconcile this discrepancy, the paper proposes a soft Bellman residual minimization method based on weighted \(L^p\) norms. By increasing the value of \(p\), the optimization objective asymptotically aligns with the contraction geometry of the Bellman operator, thereby effectively controlling error propagation while preserving the feasibility of gradient-based optimization. Theoretical analysis establishes a formal connection between residual minimization and contractive mappings, yielding a performance bound that explicitly depends on \(p\). This bound demonstrates that larger \(p\) values enhance alignment with the operator’s contraction characteristics, achieving a unified balance between geometric consistency and optimization compatibility.
πŸ“ Abstract
The problem of solving Markov decision processes under function approximation remains a fundamental challenge, even under linear function approximation settings. A key difficulty arises from a geometric mismatch: while the Bellman optimality operator is contractive in the Linfty-norm, commonly used objectives such as projected value iteration and Bellman residual minimization rely on L2-based formulations. To enable gradient-based optimization, we consider a soft formulation of Bellman residual minimization and extend it to a generalized weighted Lp -norm. We show that this formulation aligns the optimization objective with the contraction geometry of the Bellman operator as p increases, and derive corresponding performance error bounds. Our analysis provides a principled connection between residual minimization and Bellman contraction, leading to improved control of error propagation while remaining compatible with gradient-based optimization.
Problem

Research questions and friction points this paper is trying to address.

Markov Decision Process
Bellman Residual Minimization
Function Approximation
Contraction Mapping
Lp-Norm
Innovation

Methods, ideas, or system contributions that make the work stand out.

Bellman residual minimization
weighted Lp-norm
contraction alignment
function approximation
Markov decision processes
πŸ”Ž Similar Papers
No similar papers found.
H
Hyukjun Yang
Department of Electrical Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, 34141, South Korea
H
Han-Dong Lim
Department of Electrical Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, 34141, South Korea
Donghwan Lee
Donghwan Lee
KAIST
Decision makingcontroland optimization