🤖 AI Summary
This work addresses the challenge of online policy adaptation in contextual Markov decision processes (CMDPs) when the underlying context is unknown. To circumvent the intractable computation of the true context posterior distribution, the authors propose a lightweight memory architecture that aggregates cumulative transition embeddings via summation. Leveraging the permutation invariance inherent in the posterior distribution, this approach avoids the high inference overhead of Transformers and the gradient instability issues of RNNs, achieving comparable representational capacity with substantially improved computational efficiency. Empirical evaluations demonstrate that the proposed method matches the performance of standard sequential models across multiple benchmark tasks while significantly reducing computational costs.
📝 Abstract
We propose MATE, a simple yet effective memory architecture for solving Contextual Markov Decision Processes (CMDPs), a family of MDPs parameterized by an unobserved context. In CMDPs, an optimal agent can adapt online by maintaining the posterior belief over contexts. MATE replaces this intractable posterior with a sum-aggregated memory, leveraging the posterior's permutation invariance to retain provably sufficient expressiveness. Compared to prior memory architectures, MATE avoids the growing per-step rollout cost of Transformers and the gradient issues commonly associated with Recurrent Neural Networks (RNNs). Extensive evaluations across diverse benchmarks demonstrate that MATE provides clear computational advantages while achieving performance comparable to standard sequence-model baselines.