Human Implicit Preference-Based Policy Fine-tuning for Multi-Agent Reinforcement Learning in USV Swarm

📅 2025-03-05

📈 Citations: 0

✨ Influential: 0

career value

222K/year

🤖 AI Summary

Multi-agent reinforcement learning (MARL) for unmanned surface vehicle (USV) swarms faces challenges in modeling user preferences, designing reward functions, and resolving credit assignment ambiguity. Method: We propose the first agent-level feedback classification framework for MARL, structured across intra-agent, inter-agent, and intra-swarm levels. Integrating reinforcement learning from human feedback (RLHF) with multi-granularity feedback modeling, we employ large language models (LLMs) as trustworthy implicit preference evaluators—replacing costly explicit human annotations. Validation spans scenarios including region-constrained navigation, collision avoidance, and task allocation. Contribution/Results: Experiments demonstrate significant improvements in policy authenticity, fairness, and task consistency. The approach enables robust, coordinated execution of search-and-rescue and surveillance missions by USV swarms in complex, dynamic maritime environments.

Technology Category

Application Category

📝 Abstract

Multi-Agent Reinforcement Learning (MARL) has shown promise in solving complex problems involving cooperation and competition among agents, such as an Unmanned Surface Vehicle (USV) swarm used in search and rescue, surveillance, and vessel protection. However, aligning system behavior with user preferences is challenging due to the difficulty of encoding expert intuition into reward functions. To address the issue, we propose a Reinforcement Learning with Human Feedback (RLHF) approach for MARL that resolves credit-assignment challenges through an Agent-Level Feedback system categorizing feedback into intra-agent, inter-agent, and intra-team types. To overcome the challenges of direct human feedback, we employ a Large Language Model (LLM) evaluator to validate our approach using feedback scenarios such as region constraints, collision avoidance, and task allocation. Our method effectively refines USV swarm policies, addressing key challenges in multi-agent systems while maintaining fairness and performance consistency.

Problem

Research questions and friction points this paper is trying to address.

Aligning USV swarm behavior with human preferences

Resolving credit-assignment challenges in MARL

Validating feedback using LLM for policy refinement

Innovation

Methods, ideas, or system contributions that make the work stand out.

Reinforcement Learning with Human Feedback (RLHF)

Agent-Level Feedback system for MARL

Large Language Model (LLM) evaluator

🔎 Similar Papers

No similar papers found.