Don't flatten, tokenize! Unlocking the key to SoftMoE's efficacy in deep RL

📅 2024-10-02

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

181K/year

🤖 AI Summary

In deep reinforcement learning, large models often suffer performance degradation as neural network capacity increases; although SoftMoE has been empirically shown to mitigate this issue, its underlying mechanism remains unclear. This work systematically investigates SoftMoE’s role in online RL and identifies— for the first time—that its performance gains stem primarily from sequence-based tokenization of encoder outputs (replacing conventional vector flattening), rather than the multi-expert architecture per se. By designing a single-expert variant augmented with tokenization and parameter scaling, we fully replicate SoftMoE’s sample efficiency and asymptotic return. Experiments across standard RL benchmarks confirm tokenization’s sufficiency and dominance, refuting the prevailing assumption that multi-expert structure drives performance. Our findings significantly reduce computational and memory overhead while providing mechanistic insight into architectural design for scalable RL.

Technology Category

Application Category

📝 Abstract

The use of deep neural networks in reinforcement learning (RL) often suffers from performance degradation as model size increases. While soft mixtures of experts (SoftMoEs) have recently shown promise in mitigating this issue for online RL, the reasons behind their effectiveness remain largely unknown. In this work we provide an in-depth analysis identifying the key factors driving this performance gain. We discover the surprising result that tokenizing the encoder output, rather than the use of multiple experts, is what is behind the efficacy of SoftMoEs. Indeed, we demonstrate that even with an appropriately scaled single expert, we are able to maintain the performance gains, largely thanks to tokenization.

Problem

Research questions and friction points this paper is trying to address.

SoftMoE efficacy in deep RL

Performance degradation with model size

Tokenization drives SoftMoE success

Innovation

Methods, ideas, or system contributions that make the work stand out.

Tokenizing encoder output enhances SoftMoE

Single expert maintains performance with tokenization

SoftMoE efficacy driven by tokenization, not experts

🔎 Similar Papers

From Tokens to Words: On the Inner Lexicon of LLMs