QHyer: Q-conditioned Hybrid Attention-mamba Transformer for Offline Goal-conditioned RL

πŸ“… 2026-05-03
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF

career value

191K/year
πŸ€– AI Summary
This work addresses the challenges of offline goal-conditioned reinforcement learning in partially observable, history-dependent non-Markovian environments, where sparse rewards lack discriminability and demonstration trajectories are difficult to stitch together effectively. To overcome these issues, the authors propose the QHyer framework, which replaces conventional reward signals with a state-conditional goal-reaching Q-estimator and leverages flow-based parameterization to enhance cross-trajectory behavioral stitching. Furthermore, QHyer introduces a gated mixture of attention and Mamba backbone network coupled with a content-adaptive history compression mechanism, enabling adaptive modeling of both local dynamics and long-range dependencies while circumventing the limitations of fixed-window observation extraction. Experimental results demonstrate that QHyer achieves state-of-the-art performance on both Markovian and non-Markovian datasets, confirming its effectiveness across diverse scenarios.
πŸ“ Abstract
Offline goal-conditioned RL (GCRL) learns goal-reaching policies from static datasets, but real-world datasets are often partially observable and history-dependent, exhibiting a mix of Markovian and non-Markovian that violate standard RL assumptions. History-aware sequence models such as Decision Transformer (DT) are a natural fit for long-term dependency modeling, yet pure attention is inefficient and brittle when handling local Markovian structure and long-range context simultaneously. Although recent hybrid architectures (e.g., LSDT) introduce local extractors to improve local dependencies modeling, the fixed-window extraction cannot adapt its effective memory to varying dependency lengths in temporally heterogeneous settings, often truncating long-range context rather than compressing its content adaptively. Moreover, sequential offline GCRL faces a key bottleneck: under sparse rewards, return-to-go (RTG) becomes non-discriminative across sub-trajectories, providing little guidance signal for stitching goal-reaching behaviors from diverse demonstrations. To address these, we propose \textbf{QHyer}, which replaces RTG with a flow-parameterized, state-conditioned goal-reaching Q-estimator to support stitching across demonstrations, and introduces a gated Hybrid Attention-Mamba backbone that performs content-adaptive history compression while preserving local dynamics. Extensive experiments demonstrate that \textbf{QHyer} achieves state-of-the-art performance on both non-Markovian and Markovian datasets, validating its effectiveness for diverse scenarios.
Problem

Research questions and friction points this paper is trying to address.

Offline Goal-conditioned RL
Partial Observability
History Dependency
Sparse Rewards
Non-Markovian Dynamics
Innovation

Methods, ideas, or system contributions that make the work stand out.

Goal-conditioned RL
Hybrid Attention-Mamba
Offline Reinforcement Learning
History Compression
Q-estimator
πŸ”Ž Similar Papers
No similar papers found.