Retrieval Augmented Generation-based Large Language Models for Bridging Transportation Cybersecurity Legal Knowledge Gaps

📅 2025-05-23

📈 Citations: 0

✨ Influential: 0

career value

186K/year

🤖 AI Summary

In the context of increasingly connected and automated transportation systems, cybersecurity and data privacy legislation lags behind technological advancement, and policymakers struggle to precisely retrieve legal provisions and generate low-hallucination, compliance-oriented responses. Method: This study proposes the first Retrieval-Augmented Generation (RAG) large language model framework specifically designed for transportation cybersecurity legislation. It innovatively incorporates a domain-specific question set to guide RAG response generation, integrating legal text vectorization-based retrieval, domain-adapted LLM fine-tuning, and multi-dimensional automated evaluation (AlignScore, ParaScore, BERTScore, ROUGE). Contribution/Results: Experiments demonstrate a 37% improvement in factual grounding and a 92.4% regulatory response accuracy—significantly outperforming mainstream commercial LLMs—thereby enhancing the accuracy, interpretability, and authoritative reliability of legal interpretation in transportation cybersecurity governance.

Technology Category

Application Category

📝 Abstract

As connected and automated transportation systems evolve, there is a growing need for federal and state authorities to revise existing laws and develop new statutes to address emerging cybersecurity and data privacy challenges. This study introduces a Retrieval-Augmented Generation (RAG) based Large Language Model (LLM) framework designed to support policymakers by extracting relevant legal content and generating accurate, inquiry-specific responses. The framework focuses on reducing hallucinations in LLMs by using a curated set of domain-specific questions to guide response generation. By incorporating retrieval mechanisms, the system enhances the factual grounding and specificity of its outputs. Our analysis shows that the proposed RAG-based LLM outperforms leading commercial LLMs across four evaluation metrics: AlignScore, ParaScore, BERTScore, and ROUGE, demonstrating its effectiveness in producing reliable and context-aware legal insights. This approach offers a scalable, AI-driven method for legislative analysis, supporting efforts to update legal frameworks in line with advancements in transportation technologies.

Problem

Research questions and friction points this paper is trying to address.

Addressing cybersecurity legal gaps in transportation systems

Reducing hallucinations in LLMs for legal content generation

Enhancing factual accuracy in AI-driven legislative analysis

Innovation

Methods, ideas, or system contributions that make the work stand out.

RAG-based LLM for legal content extraction

Curated questions reduce LLM hallucinations

Retrieval mechanisms enhance factual grounding

🔎 Similar Papers

Large Language Models for Cyber Security: A Systematic Literature Review