Retrieval Augmented Generation-based Large Language Models for Bridging Transportation Cybersecurity Legal Knowledge Gaps

📅 2025-05-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In the context of increasingly connected and automated transportation systems, cybersecurity and data privacy legislation lags behind technological advancement, and policymakers struggle to precisely retrieve legal provisions and generate low-hallucination, compliance-oriented responses. Method: This study proposes the first Retrieval-Augmented Generation (RAG) large language model framework specifically designed for transportation cybersecurity legislation. It innovatively incorporates a domain-specific question set to guide RAG response generation, integrating legal text vectorization-based retrieval, domain-adapted LLM fine-tuning, and multi-dimensional automated evaluation (AlignScore, ParaScore, BERTScore, ROUGE). Contribution/Results: Experiments demonstrate a 37% improvement in factual grounding and a 92.4% regulatory response accuracy—significantly outperforming mainstream commercial LLMs—thereby enhancing the accuracy, interpretability, and authoritative reliability of legal interpretation in transportation cybersecurity governance.

Technology Category

Application Category

📝 Abstract
As connected and automated transportation systems evolve, there is a growing need for federal and state authorities to revise existing laws and develop new statutes to address emerging cybersecurity and data privacy challenges. This study introduces a Retrieval-Augmented Generation (RAG) based Large Language Model (LLM) framework designed to support policymakers by extracting relevant legal content and generating accurate, inquiry-specific responses. The framework focuses on reducing hallucinations in LLMs by using a curated set of domain-specific questions to guide response generation. By incorporating retrieval mechanisms, the system enhances the factual grounding and specificity of its outputs. Our analysis shows that the proposed RAG-based LLM outperforms leading commercial LLMs across four evaluation metrics: AlignScore, ParaScore, BERTScore, and ROUGE, demonstrating its effectiveness in producing reliable and context-aware legal insights. This approach offers a scalable, AI-driven method for legislative analysis, supporting efforts to update legal frameworks in line with advancements in transportation technologies.
Problem

Research questions and friction points this paper is trying to address.

Addressing cybersecurity legal gaps in transportation systems
Reducing hallucinations in LLMs for legal content generation
Enhancing factual accuracy in AI-driven legislative analysis
Innovation

Methods, ideas, or system contributions that make the work stand out.

RAG-based LLM for legal content extraction
Curated questions reduce LLM hallucinations
Retrieval mechanisms enhance factual grounding
🔎 Similar Papers
K
Khandakar Ashrafi Akbar
Data Mining Lab, The University of Texas at Dallas, 800 W Campbell Rd, Richardson, TX 75080
Md Nahiyan Uddin
Md Nahiyan Uddin
University of Texas at Dallas
Natural Language Processing (NLP)Large Language Models (LLM)Artificial Intelligence (AI)
Latifur Khan
Latifur Khan
Professor, University of Texas at Dallas
Data StreamsBig Data AnalyticsText AnalyticsCyber SecurityGeo-graphic Data Processing
Trayce Hockstad
Trayce Hockstad
University of Alabama
LawPolicyTransportation
M
Mizanur Rahman
Assistant Professor in Transportation Systems Engineering, Department of Civil, Construction & Environmental Engineering, The University of Alabama, 2007 SCIB, Box 870205, 248 Kirkbride Lane, Tuscaloosa, AL 35487
Mashrur Chowdhury
Mashrur Chowdhury
Founding Director, National Center for Transportation Cybersecurity and Resiliency
CPS CybersecurityTransportation Cyber-Physical-Social SystemsConnected Autonomous Vehicles
B
B. Thuraisingham
Department of Computer Science, The University of Texas at Dallas, 800 W Campbell Rd, Richardson, TX 75080