Faster Reinforcement Learning by Freezing Slow States

📅 2023-01-03

📈 Citations: 0

✨ Influential: 0

career value

216K/year

🤖 AI Summary

This paper studies infinite-horizon discounted Markov decision processes (MDPs) with fast-slow two-timescale structure—common in applications such as inventory control and dynamic pricing—where high-frequency decisions drive the discount factor toward unity, causing severe computational bottlenecks. To address this, we propose a hierarchical “slow-state freezing” approximation framework: a lower-level solver computes finite-horizon subproblems with frozen slow states, while an upper-level value iteration operates on the slow timescale. We theoretically characterize the regret induced by freezing, establishing for the first time an explicit trade-off between computational budget and policy performance. Moreover, we prove that the framework systematically corrects heuristic biases arising from ignoring slow-state dynamics. Experiments on inventory management and dynamic pricing tasks demonstrate that our method achieves policy quality comparable to exact full-state MDP solvers, while reducing computational cost significantly—demonstrating both effectiveness and robustness.

📝 Abstract

We study infinite horizon Markov decision processes (MDPs) with"fast-slow"structure, where some state variables evolve rapidly ("fast states") while others change more gradually ("slow states"). Such structure is common in real-world problems where sequential decisions need to be made at high frequencies over long horizons, where slowly evolving information also influences optimal decisions. Examples include inventory control under slowly changing demand, or dynamic pricing with gradually shifting consumer behavior. Modeling the problem at the natural decision frequency leads to MDPs with discount factors close to one, making them computationally challenging. We propose a novel approximation strategy that"freezes"slow states during a phase of lower-level planning, solving finite-horizon MDPs conditioned on a fixed slow state, and then applying value iteration to an auxiliary upper-level MDP that evolves on a slower timescale. Freezing states for short periods of time leads to easier-to-solve lower-level problems, while a slower upper-level timescale allows for a more favorable discount factor. On the theoretical side, we analyze the regret incurred by our frozen-state approach, which leads to simple insights on how to trade off computational budget versus regret. Empirically, we demonstrate that frozen-state methods produce high-quality policies with significantly less computation, and we show that simply omitting slow states is often a poor heuristic.

Problem

Research questions and friction points this paper is trying to address.

Addresses MDPs with fast-slow state dynamics

Reduces computation in high-frequency decision-making

Approximates solutions by freezing slow states temporarily

Innovation

Methods, ideas, or system contributions that make the work stand out.

Freezes slow states for easier planning

Uses two-level MDPs with different timescales

Reduces computation while maintaining policy quality

🔎 Similar Papers

No similar papers found.