Contraction-Aligned Analysis of Soft Bellman Residual Minimization with Weighted Lp-Norm for Markov Decision Problem

📅 2026-04-08

📈 Citations: 0

✨ Influential: 0

career value

179K/year

🤖 AI Summary

This work addresses the geometric mismatch between Bellman residual minimization and the infinity-norm contraction property of the Bellman operator in function approximation settings. To reconcile this discrepancy, the paper proposes a soft Bellman residual minimization method based on weighted \(L^p\) norms. By increasing the value of \(p\), the optimization objective asymptotically aligns with the contraction geometry of the Bellman operator, thereby effectively controlling error propagation while preserving the feasibility of gradient-based optimization. Theoretical analysis establishes a formal connection between residual minimization and contractive mappings, yielding a performance bound that explicitly depends on \(p\). This bound demonstrates that larger \(p\) values enhance alignment with the operator’s contraction characteristics, achieving a unified balance between geometric consistency and optimization compatibility.

Technology Category

Application Category

📝 Abstract

The problem of solving Markov decision processes under function approximation remains a fundamental challenge, even under linear function approximation settings. A key difficulty arises from a geometric mismatch: while the Bellman optimality operator is contractive in the Linfty-norm, commonly used objectives such as projected value iteration and Bellman residual minimization rely on L2-based formulations. To enable gradient-based optimization, we consider a soft formulation of Bellman residual minimization and extend it to a generalized weighted Lp -norm. We show that this formulation aligns the optimization objective with the contraction geometry of the Bellman operator as p increases, and derive corresponding performance error bounds. Our analysis provides a principled connection between residual minimization and Bellman contraction, leading to improved control of error propagation while remaining compatible with gradient-based optimization.

Problem

Research questions and friction points this paper is trying to address.

Markov Decision Process

Bellman Residual Minimization

Function Approximation

Contraction Mapping

Lp-Norm

Innovation

Methods, ideas, or system contributions that make the work stand out.

Bellman residual minimization

weighted Lp-norm

contraction alignment