Sable: a Performant, Efficient and Scalable Sequence Model for MARL

📅 2024-10-02

📈 Citations: 0

✨ Influential: 0

career value

221K/year

🤖 AI Summary

Addressing the challenge of jointly optimizing performance, memory efficiency, and scalability in multi-agent reinforcement learning (MARL), this paper proposes Sable—a novel sequential model. Methodologically, Sable pioneers the adaptation of Retentive Networks’ retention mechanism to MARL temporal modeling, integrating lightweight state compression, distributed sequence attention approximation, and multi-agent temporal embedding alignment for efficient long-horizon contextual modeling. Empirically, Sable achieves state-of-the-art (SOTA) performance on 34 out of 45 tasks across six diverse MARL benchmark environments. It scales to over 1,000 agents with linear memory growth and maintains stable performance under large-scale deployment. Ablation studies confirm that each component meaningfully enhances both computational efficiency and representational capacity.

Technology Category

Application Category

📝 Abstract

As multi-agent reinforcement learning (MARL) progresses towards solving larger and more complex problems, it becomes increasingly important that algorithms exhibit the key properties of (1) strong performance, (2) memory efficiency and (3) scalability. In this work, we introduce Sable, a performant, memory efficient and scalable sequence modeling approach to MARL. Sable works by adapting the retention mechanism in Retentive Networks to achieve computationally efficient processing of multi-agent observations with long context memory for temporal reasoning. Through extensive evaluations across six diverse environments, we demonstrate how Sable is able to significantly outperform existing state-of-the-art methods in a large number of diverse tasks (34 out of 45 tested). Furthermore, Sable maintains performance as we scale the number of agents, handling environments with more than a thousand agents while exhibiting a linear increase in memory usage. Finally, we conduct ablation studies to isolate the source of Sable's performance gains and confirm its efficient computational memory usage.

Problem

Research questions and friction points this paper is trying to address.

Enhances MARL with performance and efficiency

Scales effectively with increasing agent numbers

Improves temporal reasoning in multi-agent systems

Innovation

Methods, ideas, or system contributions that make the work stand out.

Adapts retention mechanism

Efficient multi-agent processing

Scales linearly with agents

🔎 Similar Papers

Chrono: A Simple Blueprint for Representing Time in MLLMs