Automata-Conditioned Cooperative Multi-Agent Reinforcement Learning

📅 2025-11-04

📈 Citations: 0

✨ Influential: 0

career value

195K/year

🤖 AI Summary

This paper addresses the challenge of learning temporally coordinated multi-agent policies for multi-task settings under the centralized training with decentralized execution (CTDE) paradigm. To overcome the low sample efficiency and poor generalization of existing methods to diverse tasks, we propose ACC-MARL—a framework that models temporal tasks as finite-state automata to enable explicit task decomposition and agent coordination. It introduces a task-conditioned policy network and, within the CTDE framework, incorporates a value-function-driven online task assignment mechanism that dynamically optimizes role allocation during execution. Experiments demonstrate that ACC-MARL successfully emergently learns multi-step collaborative behaviors—such as cooperative door-opening and sequential unlocking—achieving significant improvements in task success rate, sample efficiency, and cross-task generalization.

Technology Category

Application Category

📝 Abstract

We study the problem of learning multi-task, multi-agent policies for cooperative, temporal objectives, under centralized training, decentralized execution. In this setting, using automata to represent tasks enables the decomposition of complex tasks into simpler sub-tasks that can be assigned to agents. However, existing approaches remain sample-inefficient and are limited to the single-task case. In this work, we present Automata-Conditioned Cooperative Multi-Agent Reinforcement Learning (ACC-MARL), a framework for learning task-conditioned, decentralized team policies. We identify the main challenges to ACC-MARL's feasibility in practice, propose solutions, and prove the correctness of our approach. We further show that the value functions of learned policies can be used to assign tasks optimally at test time. Experiments show emergent task-aware, multi-step coordination among agents, e.g., pressing a button to unlock a door, holding the door, and short-circuiting tasks.

Problem

Research questions and friction points this paper is trying to address.

Learning multi-task multi-agent policies for temporal cooperative objectives

Overcoming sample inefficiency in automata-based task decomposition methods

Enabling emergent task-aware coordination under decentralized execution

Innovation

Methods, ideas, or system contributions that make the work stand out.

Automata-conditioned framework for multi-task MARL

Decentralized team policies with task decomposition

Optimal task assignment using learned value functions

🔎 Similar Papers

No similar papers found.