🤖 AI Summary
To address the poor generalizability of airport surface movement prediction—stemming from the lack of large-scale, standardized datasets—this paper introduces Amelia-48, the first publicly available ultra-large-scale multi-airport trajectory dataset (48 airports, 70 TB). We further propose AmeliaTF, a multi-agent Transformer model explicitly designed for cross-airport transfer learning. Complementing this, we establish Amelia-10—the first unified aviation benchmark (10 airports, 292 days)—with a standardized evaluation protocol and automated data cleaning pipeline. Our methodology integrates spatiotemporal graph modeling, multi-source map fusion (e.g., AIXM, GeoJSON), and real-world SWIM system data. Experiments demonstrate that AmeliaTF significantly improves taxi-out time prediction accuracy, early conflict detection capability, and aircraft carbon emission estimation fidelity. All code, datasets, and processing tools are fully open-sourced, establishing a foundational resource for the aviation AI research community.
📝 Abstract
The growing demand for air travel necessitates advancements in air traffic management technologies to ensure safe and efficient operations. Predictive models for terminal airspace can help anticipate future movements and traffic flows, enabling proactive planning for efficient coordination, collision risk assessment, taxi-out time prediction, departure metering, and emission estimations. Although data-driven predictive models have shown promise in tackling some of these challenges, the absence of large-scale curated surface movement datasets in the public domain has hindered the development of scalable and generalizable approaches. In this context, we propose the Amelia framework, which consists of four key contributions. First, Amelia-48, a large dataset of airport surface movement collected through the FAA's System Wide Information Management (SWIM) Program. This dataset includes over two years' worth of trajectory data (~70TB) across 48 US airports and map data. Second, we develop AmeliaTF, a large transformer-based baseline for multi-agent, multi-airport trajectory forecasting. Third, we propose Amelia-10, a training and evaluation benchmark consisting of 292 days of post-processed data from 10 different airports and a series of experiments to promote the development of foundation models in aviation. We provide baseline results across our benchmark using AmeliaTF. Finally, we release our framework and tools to encourage further aviation research in the forecasting domain and beyond at https://ameliacmu.github.io