π€ AI Summary
This study addresses the safety challenges at non-towered airports, where pilots rely on self-announced communications via the Common Traffic Advisory Frequency (CTAF), and automated risk assessment tools are lacking, increasing the likelihood of near-midair conflicts. The work proposes a novel multimodal large language model framework that integrates CTAF transcripts, METAR weather reports, ADS-B flight trajectories, and aeronautical charts. It introduces a comprehensive risk taxonomy encompassing twelve distinct hazard categories and constructs a dedicated synthetic evaluation dataset. Experimental results demonstrate that open-source models such as Qwen 2.5-7B and Mistral-7B achieve macro F1 scores exceeding 0.85 in a binary classification task using only CTAF and METAR data, and successfully identify real-world right-of-way violations, thereby validating the approachβs effectiveness and practical applicability.
π Abstract
We investigate frameworks for post-flight safety analysis at non-towered airports using large language models (LLMs). Non-towered airports rely on the Common Traffic Advisory Frequency (CTAF) for air traffic coordination and experience frequent near mid-air collisions due to the pilot self-announcement communication protocol. We propose a general vision-language model (VLM) approach to analyze the transcribed CTAF radio communications in natural language, METeorological Aerodrome Report (METAR) weather data, Automatic Dependent Surveillance-Broadcast (ADS-B) flight trajectories, and Visual Flight Rules sectional charts of the airfield. We provide a preliminary study at Half Moon Bay Airport, with a qualitative real world case study and a quantitative evaluation using a new synthetic dataset of communications and weather modalities. We qualitatively evaluate our framework on real flight data using Gemini 2.5 Pro, demonstrating accurate identification of a right-of-way violation. The synthetic dataset is derived from real examples and includes a 12-category hazard taxonomy, and is used to benchmark three open-source (Qwen 2.5-7B, Mistral-7B, Gemma-2-9B) and three closed-source (GPT-4o, GPT-5.4, Claude Sonnet 4.6) LLM models on the subset of inputs related to CTAF and METAR. Even limited to CTAF and METAR inputs and open source LLMs, instances of our framework typically achieve a macro F1 score above 0.85 on a binary nominal/danger classification task. Future work includes a quantitative evaluation across all modalities and a larger number of real world examples. Taken together, our results suggest that VLM analysis of safety at non-towered airports may be a valuable future capability.