🤖 AI Summary
Gaussian policies in continuous control suffer from geometric mismatch between their unbounded support and bounded action spaces, necessitating squashing functions that induce distortion. This work proposes Geometric Action Control (GAC), which decouples action modeling into a unit direction vector on the hypersphere and a learnable concentration parameter—naturally respecting spherical geometry. GAC employs spherical normalization, adaptive concentration tuning, and differentiable reparameterization to achieve *O(d)* computational complexity using only *d+1* parameters, circumventing the costly Bessel function evaluations and rejection sampling inherent in von Mises–Fisher (vMF) distributions. Evaluated on six MuJoCo benchmarks, GAC consistently outperforms baselines including SAC, achieving a 37.6% performance gain on Ant-v4 and establishing new state-of-the-art results on four tasks.
📝 Abstract
Gaussian policies have dominated continuous control in deep reinforcement learning (RL), yet they suffer from a fundamental mismatch: their unbounded support requires ad-hoc squashing functions that distort the geometry of bounded action spaces. While von Mises-Fisher (vMF) distributions offer a theoretically grounded alternative on the sphere, their reliance on Bessel functions and rejection sampling hinders practical adoption. We propose extbf{Geometric Action Control (GAC)}, a novel action generation paradigm that preserves the geometric benefits of spherical distributions while extit{simplifying computation}. GAC decomposes action generation into a direction vector and a learnable concentration parameter, enabling efficient interpolation between deterministic actions and uniform spherical noise. This design reduces parameter count from (2d) to (d+1), and avoids the (O(dk)) complexity of vMF rejection sampling, achieving simple (O(d)) operations. Empirically, GAC consistently matches or exceeds state-of-the-art methods across six MuJoCo benchmarks, achieving 37.6% improvement over SAC on Ant-v4 and the best results on 4 out of 6 tasks. Our ablation studies reveal that both extbf{spherical normalization} and extbf{adaptive concentration control} are essential to GAC's success. These findings suggest that robust and efficient continuous control does not require complex distributions, but a principled respect for the geometry of action spaces. Code and pretrained models are available in supplementary materials.