Exploring Large Action Sets with Hyperspherical Embeddings using von Mises-Fisher Sampling

📅 2025-07-01

📈 Citations: 0

✨ Influential: 0

career value

288K/year

🤖 AI Summary

Reinforcement learning (RL) suffers from inefficient exploration and poor scalability of Boltzmann policies in large-scale action spaces, where actions are represented as hyperspherical embedding vectors. Method: We propose a scalable exploration framework based on the von Mises–Fisher (vMF) distribution—the first application of vMF to RL action sampling—enabling directional, high-efficiency sampling in the state-embedding space, coupled with approximate nearest-neighbor search to rapidly retrieve high-similarity actions without exhaustive enumeration. Contribution/Results: We theoretically prove its asymptotic equivalence to the spherical Boltzmann policy, ensuring statistical soundness and computational scalability. Experiments across simulated environments, public benchmarks, and a real-world global music streaming recommendation system demonstrate substantial improvements in both exploration efficiency and policy performance. The time complexity is reduced from O(|A|) to O(log |A|), establishing a new paradigm for RL in ultra-large action spaces.

Technology Category

Application Category

📝 Abstract

This paper introduces von Mises-Fisher exploration (vMF-exp), a scalable method for exploring large action sets in reinforcement learning problems where hyperspherical embedding vectors represent these actions. vMF-exp involves initially sampling a state embedding representation using a von Mises-Fisher distribution, then exploring this representation's nearest neighbors, which scales to virtually unlimited numbers of candidate actions. We show that, under theoretical assumptions, vMF-exp asymptotically maintains the same probability of exploring each action as Boltzmann Exploration (B-exp), a popular alternative that, nonetheless, suffers from scalability issues as it requires computing softmax values for each action. Consequently, vMF-exp serves as a scalable alternative to B-exp for exploring large action sets with hyperspherical embeddings. Experiments on simulated data, real-world public data, and the successful large-scale deployment of vMF-exp on the recommender system of a global music streaming service empirically validate the key properties of the proposed method.

Problem

Research questions and friction points this paper is trying to address.

Scalable exploration of large action sets in reinforcement learning

Efficient sampling using von Mises-Fisher distribution for hyperspherical embeddings

Overcoming scalability issues of Boltzmann Exploration in large action spaces

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses von Mises-Fisher sampling for exploration

Scales to unlimited candidate actions

Replaces Boltzmann Exploration with hyperspherical embeddings

🔎 Similar Papers

Diffusion Models Meet Contextual Bandits with Large Action Spaces