Deflated Dynamics Value Iteration

📅 2024-07-15

🏛️ arXiv.org

📈 Citations: 2

✨ Influential: 0

career value

185K/year

🤖 AI Summary

Value iteration (VI) converges slowly—specifically at rate $O(gamma^k)$—when the discount factor $gamma$ is close to 1, hindering high-precision value function estimation. Method: This paper proposes an acceleration framework for VI based on matrix splitting and dominant eigenstructure elimination. It introduces matrix elimination—a technique previously unexplored in dynamic programming—by spectrally removing the subdominant eigencomponents associated with eigenvalues beyond the top $s$ ones. Contribution/Results: We establish a tightened convergence rate of $ ilde{O}(gamma^k |lambda_{s+1}|^k)$, where $|lambda_{s+1}| < gamma$ denotes the magnitude of the first eliminated eigenvalue. The method preserves theoretical rigor while ensuring algorithmic feasibility and naturally extends to reinforcement learning via the DDTD algorithm. Empirical evaluation across canonical MDPs and RL benchmarks demonstrates substantial acceleration and strong generalization, offering a novel paradigm for high-accuracy value function computation.

Technology Category

Application Category

📝 Abstract

The Value Iteration (VI) algorithm is an iterative procedure to compute the value function of a Markov decision process, and is the basis of many reinforcement learning (RL) algorithms as well. As the error convergence rate of VI as a function of iteration $k$ is $O(gamma^k)$, it is slow when the discount factor $gamma$ is close to $1$. To accelerate the computation of the value function, we propose Deflated Dynamics Value Iteration (DDVI). DDVI uses matrix splitting and matrix deflation techniques to effectively remove (deflate) the top $s$ dominant eigen-structure of the transition matrix $mathcal{P}^{pi}$. We prove that this leads to a $ ilde{O}(gamma^k |lambda_{s+1}|^k)$ convergence rate, where $lambda_{s+1}$is $(s+1)$-th largest eigenvalue of the dynamics matrix. We then extend DDVI to the RL setting and present Deflated Dynamics Temporal Difference (DDTD) algorithm. We empirically show the effectiveness of the proposed algorithms.

Problem

Research questions and friction points this paper is trying to address.

Accelerate value function computation in Value Iteration

Improve slow convergence when discount factor is near 1

Remove dominant eigen-structure of transition matrix

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses matrix splitting for faster convergence

Applies matrix deflation to remove top eigenvalues

Extends to RL with Deflated Dynamics TD

🔎 Similar Papers

Enhancing Deep Hedging of Options with Implied Volatility Surface Feedback Information