Simplicial Embeddings Improve Sample Efficiency in Actor-Critic Agents

📅 2025-10-15

📈 Citations: 0

✨ Influential: 0

career value

194K/year

🤖 AI Summary

To address the low sample efficiency of Actor-Critic agents in deep reinforcement learning, this paper proposes Simplex Embedding—a novel architectural component that constrains both policy and value function representations to the probability simplex geometry. By introducing sparse, discretized features, it injects a strong geometric inductive bias into representation learning. This lightweight, plug-and-play embedding layer integrates seamlessly into mainstream algorithms—including FastTD3, FastSAC, and PPO—without increasing training overhead. It significantly improves gradient quality and value function stability. Empirical evaluation across continuous control (MuJoCo) and discrete control (Atari) benchmarks demonstrates substantial reductions in environment interactions required to reach target performance, alongside improved final policy performance. These results validate the effectiveness of simplex-geometric priors in enhancing representation learning and policy optimization.

Technology Category

Application Category

📝 Abstract

Recent works have proposed accelerating the wall-clock training time of actor-critic methods via the use of large-scale environment parallelization; unfortunately, these can sometimes still require large number of environment interactions to achieve a desired level of performance. Noting that well-structured representations can improve the generalization and sample efficiency of deep reinforcement learning (RL) agents, we propose the use of simplicial embeddings: lightweight representation layers that constrain embeddings to simplicial structures. This geometric inductive bias results in sparse and discrete features that stabilize critic bootstrapping and strengthen policy gradients. When applied to FastTD3, FastSAC, and PPO, simplicial embeddings consistently improve sample efficiency and final performance across a variety of continuous- and discrete-control environments, without any loss in runtime speed.

Problem

Research questions and friction points this paper is trying to address.

Improving sample efficiency in actor-critic reinforcement learning agents

Reducing environment interactions needed for desired performance levels

Enhancing representation structure through simplicial geometric embeddings

Innovation

Methods, ideas, or system contributions that make the work stand out.

Simplicial embeddings constrain representations to geometric structures

Lightweight layers induce sparse discrete features for stability

Improves sample efficiency without runtime speed loss

🔎 Similar Papers

Don't flatten, tokenize! Unlocking the key to SoftMoE's efficacy in deep RL