Multicopy Reinforcement Learning Agents

📅 2023-09-19
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper introduces “Multi-Replica Reinforcement Learning” (MR-RL), a novel paradigm wherein a single agent autonomously generates multiple homogeneous replicas in noisy environments to collaboratively accomplish the original task, thereby enhancing robustness and success rate. To address the challenge of dynamically determining the optimal number of replicas, we formulate the first Multi-Replica Markov Decision Process (MR-MDP) and propose a value-function decomposition–based policy gradient algorithm that adaptively balances replication benefits against computational resource costs. Crucially, our method enables end-to-end learning of the optimal replica count without requiring a predefined upper bound. Experiments across diverse noisy RL tasks demonstrate substantial improvements: +23.6% in task success rate and 37% reduction in training steps—highlighting significant gains in robustness, scalability, and generalization enabled by the multi-replica mechanism.
📝 Abstract
This paper examines a novel type of multi-agent problem, in which an agent makes multiple identical copies of itself in order to achieve a single agent task better or more efficiently. This strategy improves performance if the environment is noisy and the task is sometimes unachievable by a single agent copy. We propose a learning algorithm for this multicopy problem which takes advantage of the structure of the value function to efficiently learn how to balance the advantages and costs of adding additional copies.
Problem

Research questions and friction points this paper is trying to address.

Learning algorithm for multicopy agents in noisy environments
Balancing advantages and costs of multiple identical agent copies
Improving single-agent task performance via self-replication strategy
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multicopy agents improve noisy environment performance
Learning algorithm balances copy advantages and costs
Value function structure enhances multicopy learning efficiency
🔎 Similar Papers
No similar papers found.
A
Alicia P. Wolfe
Wesleyan University
O
Oliver Diamond
Wesleyan University
R
Remi Feuerman
Wesleyan University
M
Magdalena Kisielinska
Wesleyan University
B
Brigitte Goeler-Slough
Wesleyan University
Victoria Manfredi
Victoria Manfredi
Associate Professor, Wesleyan University