🤖 AI Summary
Cybersecurity threats to transportation systems are escalating due to digitalization and interconnectivity, yet no systematic, structured repository of incident records exists. To address this gap, we propose a generative AI–based framework leveraging fine-tuned large language models (LLMs) to automatically extract, classify, and structure cybersecurity incident data from multiple open-source databases—including CSIS, UMCED, EuRepoC, MCAD, and TraCR. We introduce the first domain-specific, multimodal transportation cybersecurity incident database, covering aviation, maritime, rail, road, and intermodal transport. Furthermore, we integrate retrieval-augmented generation (RAG) to develop an interactive question-answering system supporting natural-language queries. This work establishes a foundational knowledge management infrastructure for transportation cybersecurity, significantly enhancing the discoverability, interpretability, and operational utility of incident information.
📝 Abstract
Technological advancements have revolutionized numerous industries, including transportation. While digitalization, automation, and connectivity have enhanced safety and efficiency, they have also introduced new vulnerabilities. With 95% of data breaches attributed to human error, promoting cybersecurity awareness in transportation is increasingly critical. Despite numerous cyberattacks on transportation systems worldwide, comprehensive and centralized records of these incidents remain scarce. To address this gap and enhance cyber awareness, this paper presents a large language model (LLM) based approach to extract and organize transportation related cyber incidents from publicly available datasets. A key contribution of this work is the use of generative AI to transform unstructured, heterogeneous cyber incident data into structured formats. Incidents were sourced from the Center for Strategic & International Studies (CSIS) List of Significant Cyber Incidents, the University of Maryland Cyber Events Database (UMCED), the European Repository of Cyber Incidents (EuRepoC), the Maritime Cyber Attack Database (MCAD), and the U.S. DOT Transportation Cybersecurity and Resiliency (TraCR) Examples of Cyber Attacks in Transportation (2018 to 2022). These were classified by a fine tuned LLM into five transportation modes: aviation, maritime, rail, road, and multimodal, forming a transportation specific cyber incident database. Another key contribution of this work is the development of a Retrieval Augmented Generation question answering system, designed to enhance accessibility and practical use by enabling users to query the curated database for specific details on transportation related cyber incidents. By leveraging LLMs for both data extraction and user interaction, this study contributes a novel, accessible tool for improving cybersecurity awareness in the transportation sector.