MO-MIX: Multi-Objective Multi-Agent Cooperative Decision-Making With Deep Reinforcement Learning

📅 2026-02-28

📈 Citations: 0

✨ Influential: 0

career value

226K/year

🤖 AI Summary

This work addresses the intersecting challenges of multi-objective optimization and multi-agent collaborative decision-making by proposing a novel approach within the centralized training with decentralized execution (CTDE) framework. The method incorporates objective preference weights as conditional inputs to design a conditional local Q-network and a parallel mixing network, jointly estimating the action-value function. An exploration-guided mechanism is introduced to enhance the uniformity and coverage of the Pareto solution set. To the best of our knowledge, this is the first approach that unifies multi-objective optimization and multi-agent coordination within a single framework. Empirical results demonstrate that the proposed method significantly outperforms existing baselines across four evaluation metrics, efficiently generating high-quality Pareto fronts with lower computational overhead.

Technology Category

Application Category

📝 Abstract

Deep reinforcement learning (RL) has been applied extensively to solve complex decision-making problems. In many real-world scenarios, tasks often have several conflicting objectives and may require multiple agents to cooperate, which are the multi-objective multi-agent decision-making problems. However, only few works have been conducted on this intersection. Existing approaches are limited to separate fields and can only handle multi-agent decision-making with a single objective, or multi-objective decision-making with a single agent. In this paper, we propose MO-MIX to solve the multi-objective multi-agent reinforcement learning (MOMARL) problem. Our approach is based on the centralized training with decentralized execution (CTDE) framework. A weight vector representing preference over the objectives is fed into the decentralized agent network as a condition for local action-value function estimation, while a mixing network with parallel architecture is used to estimate the joint action-value function. In addition, an exploration guide approach is applied to improve the uniformity of the final non-dominated solutions. Experiments demonstrate that the proposed method can effectively solve the multi-objective multi-agent cooperative decision-making problem and generate an approximation of the Pareto set. Our approach not only significantly outperforms the baseline method in all four kinds of evaluation metrics, but also requires less computational cost.

Problem

Research questions and friction points this paper is trying to address.

multi-objective

multi-agent

cooperative decision-making

reinforcement learning

Pareto set

Innovation

Methods, ideas, or system contributions that make the work stand out.

multi-objective reinforcement learning

multi-agent cooperation

centralized training with decentralized execution