Switching-Geometry Analysis of Deflated Q-Value Iteration

📅 2026-05-11
📈 Citations: 0
Influential: 0
📄 PDF

career value

204K/year
🤖 AI Summary
This study investigates the convergence rate of Q-value iteration (Q-VI) with rank-one scaling in discounted Markov decision processes. By leveraging joint spectral radius (JSR) theory and switched systems analysis, the authors project the state space onto a quotient space that removes the all-ones invariant direction, thereby establishing the first JSR-based convergence framework for scaled Q-VI. The key contributions reveal that the JSR of standard Q-VI equals the discount factor γ, while the effective JSR in the quotient space can be strictly smaller. Furthermore, the scaling operation is shown to be equivalent to scalar recentering, which preserves the sequence of induced policies. This approach yields a tighter characterization of convergence rates than the classical γ-bound, clarifying that scaling only refines the geometric description of error dynamics without altering the policy optimization trajectory.
📝 Abstract
This paper develops a joint spectral radius (JSR) framework for analyzing rank-one deflated Q-value iteration (Q-VI) in discounted Markov decision process control. Focusing on an all-ones residual correction, we interpret the resulting algorithm through the geometry of switching systems and, to the best of our knowledge, give the first JSR-based convergence analysis of deflated Q-VI for policy optimization problems. Our analysis reveals that the standard Q-VI switching system model has JSR exactly the discount factor $γ\in (0,1)$, since all admissible subsystems share the all-ones vector as an invariant direction. By passing to the quotient space that removes this direction, we obtain a projected switching system model whose JSR governs the relevant error dynamics and may be strictly smaller than $γ$. Therefore, the deflated Q-VI admits a potentially sharper convergence-rate characterization than the ambient-space $γ$-bound. Finally, we prove that the correction is equivalent to a scalar recentering of standard Q-VI. Hence, the projected trajectory, and therefore the greedy-policy sequence, is unchanged relative to standard Q-VI initialized from the same point. The benefit of deflation is not a change in the induced decision-making problem, but a more precise JSR-based description of the convergence geometry after the redundant all-ones component is removed.
Problem

Research questions and friction points this paper is trying to address.

deflated Q-value iteration
joint spectral radius
Markov decision processes
convergence analysis
switching systems
Innovation

Methods, ideas, or system contributions that make the work stand out.

joint spectral radius
deflated Q-value iteration
switching systems
Markov decision processes
convergence analysis