Scalable Decision-Making in Stochastic Environments through Learned Temporal Abstraction

📅 2025-02-28

📈 Citations: 0

✨ Influential: 0

career value

194K/year

🤖 AI Summary

To address the low decision-making efficiency of offline reinforcement learning in stochastic environments with high-dimensional continuous action spaces, this paper proposes a macro-action learning framework based on a state-conditioned Vector Quantized Variational Autoencoder (VQ-VAE). The method encodes continuous action sequences into discrete, plannable latent actions and introduces an independent latent-space prior model to capture latent-state transition dynamics, enabling efficient and robust sequential decision-making via Monte Carlo Tree Search (MCTS). Our key contribution is the first integration of VQ-VAE with latent-space dynamics modeling for offline macro-action abstraction—effectively mitigating distributional shift arising from both behavioral policy mismatch and environmental stochasticity. Evaluated on stochastic dynamical control and high-dimensional dexterous manipulation tasks, our approach significantly reduces decision latency, improves stability, outperforms leading offline RL methods, and matches the performance of strong model-free Actor-Critic baselines.

Technology Category

Application Category

📝 Abstract

Sequential decision-making in high-dimensional continuous action spaces, particularly in stochastic environments, faces significant computational challenges. We explore this challenge in the traditional offline RL setting, where an agent must learn how to make decisions based on data collected through a stochastic behavior policy. We present extit{Latent Macro Action Planner} (L-MAP), which addresses this challenge by learning a set of temporally extended macro-actions through a state-conditional Vector Quantized Variational Autoencoder (VQ-VAE), effectively reducing action dimensionality. L-MAP employs a (separate) learned prior model that acts as a latent transition model and allows efficient sampling of plausible actions. During planning, our approach accounts for stochasticity in both the environment and the behavior policy by using Monte Carlo tree search (MCTS). In offline RL settings, including stochastic continuous control tasks, L-MAP efficiently searches over discrete latent actions to yield high expected returns. Empirical results demonstrate that L-MAP maintains low decision latency despite increased action dimensionality. Notably, across tasks ranging from continuous control with inherently stochastic dynamics to high-dimensional robotic hand manipulation, L-MAP significantly outperforms existing model-based methods and performs on-par with strong model-free actor-critic baselines, highlighting the effectiveness of the proposed approach in planning in complex and stochastic environments with high-dimensional action spaces.

Problem

Research questions and friction points this paper is trying to address.

Addresses computational challenges in high-dimensional continuous action spaces.

Reduces action dimensionality using learned temporal abstraction.

Improves decision-making in stochastic environments with efficient planning.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Learns macro-actions via VQ-VAE for dimensionality reduction

Uses latent transition model for efficient action sampling

Applies Monte Carlo tree search for stochastic planning

🔎 Similar Papers

Learning World Models With Hierarchical Temporal Abstractions: A Probabilistic Perspective