Fine-Tuning Large Language Models for Cooperative Tactical Deconfliction of Small Unmanned Aerial Systems

📅 2026-03-30

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work addresses the challenge of safe, coordinated conflict resolution for small unmanned aircraft systems (sUAS) operating in low-altitude, dense, partially observable, and heterogeneous multi-agent environments. The authors propose a large language model (LLM)-based short-horizon decision-making approach, leveraging the BlueSky simulator to generate regulation-compliant air traffic data and integrating heuristic strategies from human air traffic controllers to establish a simulation-to-language data generation pipeline. For the first time, they combine LoRA-based supervised fine-tuning with GRPO preference optimization to efficiently fine-tune the Qwen-Math-7B model, enhancing domain alignment and multi-agent coordination capabilities. Experimental results demonstrate that the proposed method significantly improves decision accuracy, consistency, and separation maintenance while substantially reducing near-miss conflicts; GRPO further strengthens cooperative performance, though robustness declines slightly in interactions involving heterogeneous policies.

📝 Abstract

The growing deployment of small Unmanned Aerial Systems (sUASs) in low-altitude airspaces has increased the need for reliable tactical deconfliction under safety-critical constraints. Tactical deconfliction involves short-horizon decision-making in dense, partially observable, and heterogeneous multi-agent environments, where both cooperative separation assurance and operational efficiency must be maintained. While Large Language Models (LLMs) exhibit strong reasoning capabilities, their direct application to air traffic control remains limited by insufficient domain grounding and unpredictable output inconsistency. This paper investigates LLMs as decision-makers in cooperative multi-agent tactical deconfliction using fine-tuning strategies that align model outputs to human operator heuristics. We propose a simulation-to-language data generation pipeline based on the BlueSky air traffic simulator that produces rule-consistent deconfliction datasets reflecting established safety practices. A pretrained Qwen-Math-7B model is fine-tuned using two parameter-efficient strategies: supervised fine-tuning with Low-Rank Adaptation (LoRA) and preference-based fine-tuning combining LoRA with Group-Relative Policy Optimization (GRPO). Experimental results on validation datasets and closed-loop simulations demonstrate that supervised LoRA fine-tuning substantially improves decision accuracy, consistency, and separation performance compared to the pretrained LLM, with significant reductions in near mid-air collisions. GRPO provides additional coordination benefits but exhibits reduced robustness when interacting with heterogeneous agent policies.

Problem

Research questions and friction points this paper is trying to address.

tactical deconfliction

small Unmanned Aerial Systems

multi-agent coordination

safety-critical decision-making

low-altitude airspace

Innovation

Methods, ideas, or system contributions that make the work stand out.

Large Language Models

Tactical Deconfliction

Low-Rank Adaptation

Preference-based Fine-tuning

Multi-agent Coordination

🔎 Similar Papers

No similar papers found.

Authors to Follow