Transportation Cyber Incident Awareness through Generative AI-Based Incident Analysis and Retrieval-Augmented Question-Answering Systems

📅 2025-08-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Cybersecurity threats to transportation systems are escalating due to digitalization and interconnectivity, yet no systematic, structured repository of incident records exists. To address this gap, we propose a generative AI–based framework leveraging fine-tuned large language models (LLMs) to automatically extract, classify, and structure cybersecurity incident data from multiple open-source databases—including CSIS, UMCED, EuRepoC, MCAD, and TraCR. We introduce the first domain-specific, multimodal transportation cybersecurity incident database, covering aviation, maritime, rail, road, and intermodal transport. Furthermore, we integrate retrieval-augmented generation (RAG) to develop an interactive question-answering system supporting natural-language queries. This work establishes a foundational knowledge management infrastructure for transportation cybersecurity, significantly enhancing the discoverability, interpretability, and operational utility of incident information.

Technology Category

Application Category

📝 Abstract
Technological advancements have revolutionized numerous industries, including transportation. While digitalization, automation, and connectivity have enhanced safety and efficiency, they have also introduced new vulnerabilities. With 95% of data breaches attributed to human error, promoting cybersecurity awareness in transportation is increasingly critical. Despite numerous cyberattacks on transportation systems worldwide, comprehensive and centralized records of these incidents remain scarce. To address this gap and enhance cyber awareness, this paper presents a large language model (LLM) based approach to extract and organize transportation related cyber incidents from publicly available datasets. A key contribution of this work is the use of generative AI to transform unstructured, heterogeneous cyber incident data into structured formats. Incidents were sourced from the Center for Strategic & International Studies (CSIS) List of Significant Cyber Incidents, the University of Maryland Cyber Events Database (UMCED), the European Repository of Cyber Incidents (EuRepoC), the Maritime Cyber Attack Database (MCAD), and the U.S. DOT Transportation Cybersecurity and Resiliency (TraCR) Examples of Cyber Attacks in Transportation (2018 to 2022). These were classified by a fine tuned LLM into five transportation modes: aviation, maritime, rail, road, and multimodal, forming a transportation specific cyber incident database. Another key contribution of this work is the development of a Retrieval Augmented Generation question answering system, designed to enhance accessibility and practical use by enabling users to query the curated database for specific details on transportation related cyber incidents. By leveraging LLMs for both data extraction and user interaction, this study contributes a novel, accessible tool for improving cybersecurity awareness in the transportation sector.
Problem

Research questions and friction points this paper is trying to address.

Extracting and organizing transportation cyber incidents from datasets
Transforming unstructured cyber incident data into structured formats
Enhancing accessibility of cyber incident information via question-answering
Innovation

Methods, ideas, or system contributions that make the work stand out.

Generative AI transforms unstructured cyber incident data
LLM classifies incidents into five transportation modes
Retrieval Augmented Generation system enables querying database
🔎 Similar Papers
No similar papers found.
O
Ostonya Thomas
Glenn Department of Civil Engineering Clemson University, Clemson, South Carolina, 29634
Muhaimin Bin Munir
Muhaimin Bin Munir
Graduate Research Assistant, University of Texas at Dallas
Natural Language ProcessingComputer VisionImage ProcessingMachine Learning
J
Jean-Michel Tine
Glenn Department of Civil Engineering Clemson University, Clemson, South Carolina, 29634
M
Mizanur Rahman
Department of Civil, Construction & Environmental Engineering The University of Alabama, Tuscaloosa, Alabama, 35487
Y
Yuchen Cai
Data Mining Lab The University of Texas at Dallas, 800 W Campbell Rd, Richardson, TX 75080
K
Khandakar Ashrafi Akbar
Data Mining Lab The University of Texas at Dallas, 800 W Campbell Rd, Richardson, TX 75080
Md Nahiyan Uddin
Md Nahiyan Uddin
University of Texas at Dallas
Natural Language Processing (NLP)Large Language Models (LLM)Artificial Intelligence (AI)
Latifur Khan
Latifur Khan
Professor, University of Texas at Dallas
Data StreamsBig Data AnalyticsText AnalyticsCyber SecurityGeo-graphic Data Processing
Trayce Hockstad
Trayce Hockstad
University of Alabama
LawPolicyTransportation
Mashrur Chowdhury
Mashrur Chowdhury
Founding Director, National Center for Transportation Cybersecurity and Resiliency
CPS CybersecurityTransportation Cyber-Physical-Social SystemsConnected Autonomous Vehicles