Concurrent Learning with Aggregated States via Randomized Least Squares Value Iteration

📅 2025-01-23

📈 Citations: 0

✨ Influential: 0

career value

209K/year

🤖 AI Summary

To address inefficient cooperative exploration in multi-agent reinforcement learning (MARL), this paper proposes a randomized least-squares value iteration (RLSVI) algorithm based on state-aggregated representations. The method establishes a concurrent learning framework and provides the first theoretical proof that randomization of value functions significantly improves parallel exploration efficiency among agents. It derives polynomial worst-case regret bounds for both finite- and infinite-horizon settings, achieving the optimal single-agent regret decay rate of $O(1/sqrt{N})$. Moreover, the algorithm reduces space complexity by a factor of $K$, while incurring only a $sqrt{K}$-factor increase in regret. Numerical experiments validate both its theoretical guarantees and empirical effectiveness.

Technology Category

Application Category

📝 Abstract

Designing learning agents that explore efficiently in a complex environment has been widely recognized as a fundamental challenge in reinforcement learning. While a number of works have demonstrated the effectiveness of techniques based on randomized value functions on a single agent, it remains unclear, from a theoretical point of view, whether injecting randomization can help a society of agents {it concurently} explore an environment. The theoretical results %that we established in this work tender an affirmative answer to this question. We adapt the concurrent learning framework to extit{randomized least-squares value iteration} (RLSVI) with extit{aggregated state representation}. We demonstrate polynomial worst-case regret bounds in both finite- and infinite-horizon environments. In both setups the per-agent regret decreases at an optimal rate of $Thetaleft(frac{1}{sqrt{N}} ight)$, highlighting the advantage of concurent learning. Our algorithm exhibits significantly lower space complexity compared to cite{russo2019worst} and cite{agrawal2021improved}. We reduce the space complexity by a factor of $K$ while incurring only a $sqrt{K}$ increase in the worst-case regret bound, compared to citep{agrawal2021improved,russo2019worst}. Additionally, we conduct numerical experiments to demonstrate our theoretical findings.

Problem

Research questions and friction points this paper is trying to address.

Efficient Exploration

Reinforcement Learning

Multi-Agent Learning

Innovation

Methods, ideas, or system contributions that make the work stand out.

RLSVI

Concurrent Learning

Storage Efficiency

🔎 Similar Papers

Fast and Sample Efficient Multi-Task Representation Learning in Stochastic Contextual Bandits