Towards Principled Multi-Agent Task Agnostic Exploration

📅 2025-02-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In multi-agent reinforcement learning (MARL), cooperative exploration remains challenging when no task-specific prior knowledge is available. Method: We propose the first task-agnostic, decentralized autonomous exploration framework, unifying exploration via state-distribution entropy maximization. We systematically design multiple entropy variants and provide theoretical analysis of their properties; extend task-free single-agent exploration to MARL by integrating trust-region constraints (an extension of TRPO) with distributed policy optimization, yielding a scalable, decentralized algorithm that guarantees broad state-space coverage. Contribution/Results: We establish theoretical convergence guarantees and derive a lower bound on state-coverage. Empirical evaluation across multiple MARL benchmarks demonstrates significant improvements in state-space coverage and cross-task transfer generalization over existing baselines.

Technology Category

Application Category

📝 Abstract
In reinforcement learning, we typically refer to task-agnostic exploration when we aim to explore the environment without access to the task specification a priori. In a single-agent setting the problem has been extensively studied and mostly understood. A popular approach cast the task-agnostic objective as maximizing the entropy of the state distribution induced by the agent's policy, from which principles and methods follows. In contrast, little is known about task-agnostic exploration in multi-agent settings, which are ubiquitous in the real world. How should different agents explore in the presence of others? In this paper, we address this question through a generalization to multiple agents of the problem of maximizing the state distribution entropy. First, we investigate alternative formulations, highlighting respective positives and negatives. Then, we present a scalable, decentralized, trust-region policy search algorithm to address the problem in practical settings. Finally, we provide proof of concept experiments to both corroborate the theoretical findings and pave the way for task-agnostic exploration in challenging multi-agent settings.
Problem

Research questions and friction points this paper is trying to address.

Multi-agent task-agnostic exploration
Maximizing state distribution entropy
Scalable decentralized policy search
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-agent task-agnostic exploration
Decentralized trust-region policy search
Maximizing state distribution entropy
🔎 Similar Papers
No similar papers found.