ExpressMind: A Multimodal Pretrained Large Language Model for Expressway Operation

📅 2026-03-17

📈 Citations: 0

✨ Influential: 0

career value

194K/year

🤖 AI Summary

This work addresses the limitations of current expressway operations, which rely on rule-based and isolated models that struggle to integrate cross-system knowledge, while general-purpose large language models lack sufficient understanding of regulations and causal relationships in anomalous scenarios. To overcome these challenges, we propose ExpressMind, a multimodal large language model tailored for intelligent expressways. We construct the first full-stack multimodal dataset for expressway environments and introduce a dual-phase pretraining paradigm combining self-supervised and unsupervised learning. ExpressMind integrates graph-augmented retrieval-augmented generation (RAG) with a reinforcement learning-aligned chain-of-thought (RL-CoT) mechanism to enable deep cross-modal comprehension of text, images, and videos. Evaluated on our newly established benchmark, ExpressMind significantly outperforms existing methods in event detection, safety response generation, and complex traffic analysis tasks.

Technology Category

Application Category

📝 Abstract

The current expressway operation relies on rule-based and isolated models, which limits the ability to jointly analyze knowledge across different systems. Meanwhile, Large Language Models (LLMs) are increasingly applied in intelligent transportation, advancing traffic models from algorithmic to cognitive intelligence. However, general LLMs are unable to effectively understand the regulations and causal relationships of events in unconventional scenarios in the expressway field. Therefore, this paper constructs a pre-trained multimodal large language model (MLLM) for expressways, ExpressMind, which serves as the cognitive core for intelligent expressway operations. This paper constructs the industry's first full-stack expressway dataset, encompassing traffic knowledge texts, emergency reasoning chains, and annotated video events to overcome data scarcity. This paper proposes a dual-layer LLM pre-training paradigm based on self-supervised training and unsupervised learning. Additionally, this study introduces a Graph-Augmented RAG framework to dynamically index the expressway knowledge base. To enhance reasoning for expressway incident response strategies, we develop a RL-aligned Chain-of-Thought (RL-CoT) mechanism that enforces consistency between model reasoning and expert problem-solving heuristics for incident handling. Finally, ExpressMind integrates a cross-modal encoder to align the dynamic feature sequences under the visual and textual channels, enabling it to understand traffic scenes in both video and image modalities. Extensive experiments on our newly released multi-modal expressway benchmark demonstrate that ExpressMind comprehensively outperforms existing baselines in event detection, safety response generation, and complex traffic analysis. The code and data are available at: https://wanderhee.github.io/ExpressMind/.

Problem

Research questions and friction points this paper is trying to address.

expressway operation

multimodal large language model

cognitive intelligence

unconventional scenarios

cross-system knowledge analysis

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multimodal Large Language Model

Graph-Augmented RAG

RL-aligned Chain-of-Thought