Evaluating Retrieval Augmented Generative Models for Document Queries in Transportation Safety

📅 2025-04-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Regulatory compliance in hazardous materials transportation demands high-risk, high-precision legal query resolution, yet existing LLMs lack domain-specific reliability. Method: We introduce the first retrieval-augmented generation (RAG) system tailored to transportation safety, built upon LLaMA-2 and fine-tuned for U.S. federal and state hazardous materials regulations. Evaluated on 100 real-world queries—including route planning and permitting requirements—we propose a fine-grained, risk-aware evaluation framework integrating semantic similarity metrics with multi-dimensional human annotation (accuracy, granularity, relevance). Contribution/Results: Our domain-specific RAG significantly outperforms ChatGPT and Vertex AI in both answer accuracy and informational completeness, demonstrating superior reliability for mission-critical compliance tasks. This validates RAG’s practical efficacy and operational advantage in high-stakes regulatory domains.

Technology Category

Application Category

📝 Abstract
Applications of generative Large Language Models LLMs are rapidly expanding across various domains, promising significant improvements in workflow efficiency and information retrieval. However, their implementation in specialized, high-stakes domains such as hazardous materials transportation is challenging due to accuracy and reliability concerns. This study evaluates the performance of three fine-tuned generative models, ChatGPT, Google's Vertex AI, and ORNL Retrieval Augmented Generation augmented LLaMA 2 and LLaMA in retrieving regulatory information essential for hazardous material transportation compliance in the United States. Utilizing approximately 40 publicly available federal and state regulatory documents, we developed 100 realistic queries relevant to route planning and permitting requirements. Responses were qualitatively rated based on accuracy, detail, and relevance, complemented by quantitative assessments of semantic similarity between model outputs. Results demonstrated that the RAG-augmented LLaMA models significantly outperformed Vertex AI and ChatGPT, providing more detailed and generally accurate information, despite occasional inconsistencies. This research introduces the first known application of RAG in transportation safety, emphasizing the need for domain-specific fine-tuning and rigorous evaluation methodologies to ensure reliability and minimize the risk of inaccuracies in high-stakes environments.
Problem

Research questions and friction points this paper is trying to address.

Evaluating generative models for hazardous materials transportation queries
Assessing accuracy and reliability of LLMs in regulatory document retrieval
Comparing RAG-augmented models with ChatGPT and Vertex AI performance
Innovation

Methods, ideas, or system contributions that make the work stand out.

RAG-augmented LLaMA models for accurate retrieval
Domain-specific fine-tuning for transportation safety
Qualitative and quantitative evaluation methodologies