Human Implicit Preference-Based Policy Fine-tuning for Multi-Agent Reinforcement Learning in USV Swarm

📅 2025-03-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Multi-agent reinforcement learning (MARL) for unmanned surface vehicle (USV) swarms faces challenges in modeling user preferences, designing reward functions, and resolving credit assignment ambiguity. Method: We propose the first agent-level feedback classification framework for MARL, structured across intra-agent, inter-agent, and intra-swarm levels. Integrating reinforcement learning from human feedback (RLHF) with multi-granularity feedback modeling, we employ large language models (LLMs) as trustworthy implicit preference evaluators—replacing costly explicit human annotations. Validation spans scenarios including region-constrained navigation, collision avoidance, and task allocation. Contribution/Results: Experiments demonstrate significant improvements in policy authenticity, fairness, and task consistency. The approach enables robust, coordinated execution of search-and-rescue and surveillance missions by USV swarms in complex, dynamic maritime environments.

Technology Category

Application Category

📝 Abstract
Multi-Agent Reinforcement Learning (MARL) has shown promise in solving complex problems involving cooperation and competition among agents, such as an Unmanned Surface Vehicle (USV) swarm used in search and rescue, surveillance, and vessel protection. However, aligning system behavior with user preferences is challenging due to the difficulty of encoding expert intuition into reward functions. To address the issue, we propose a Reinforcement Learning with Human Feedback (RLHF) approach for MARL that resolves credit-assignment challenges through an Agent-Level Feedback system categorizing feedback into intra-agent, inter-agent, and intra-team types. To overcome the challenges of direct human feedback, we employ a Large Language Model (LLM) evaluator to validate our approach using feedback scenarios such as region constraints, collision avoidance, and task allocation. Our method effectively refines USV swarm policies, addressing key challenges in multi-agent systems while maintaining fairness and performance consistency.
Problem

Research questions and friction points this paper is trying to address.

Aligning USV swarm behavior with human preferences
Resolving credit-assignment challenges in MARL
Validating feedback using LLM for policy refinement
Innovation

Methods, ideas, or system contributions that make the work stand out.

Reinforcement Learning with Human Feedback (RLHF)
Agent-Level Feedback system for MARL
Large Language Model (LLM) evaluator
🔎 Similar Papers
No similar papers found.
Hyeonjun Kim
Hyeonjun Kim
Korea Military Academy Weapon System Engineering
Multi-agent systemReinforcement LearningRoboticsM&S
K
Kanghoon Lee
Korea Advanced Institute of Science and Technology (KAIST), Daejeon, Korea
J
Junho Park
Korea Advanced Institute of Science and Technology (KAIST), Daejeon, Korea
J
Jiachen Li
University of California, Riverside, CA, USA
Jinkyoo Park
Jinkyoo Park
Department of Industrial and Systems Engineering, KAIST
Machine LearningGame TheoryOptimal Control