FlowDistill: Scalable Traffic Flow Prediction via Distillation from LLMs

📅 2025-04-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Traffic flow forecasting remains challenging in resource-constrained cities due to complex spatiotemporal dependencies and scarcity of high-quality labeled data. This paper proposes FlowDistill, a lightweight and scalable framework that introduces the first LLM-based knowledge distillation paradigm for traffic forecasting: a fine-tuned large language model serves as the teacher to guide a compact MLP student. By integrating the information bottleneck principle with teacher-constrained regression loss, FlowDistill explicitly models spatiotemporal correlations to improve cross-city generalization. Compared to state-of-the-art methods, FlowDistill reduces training data requirements by 63%, improves prediction accuracy by 2.1%, decreases memory footprint by 42%, and cuts inference latency by 68%, enabling real-time deployment on edge devices. Its core innovation lies in the synergistic co-design of LLM-driven knowledge distillation and traffic-aware lightweight spatiotemporal modeling.

Technology Category

Application Category

📝 Abstract
Accurate traffic flow prediction is vital for optimizing urban mobility, yet it remains difficult in many cities due to complex spatio-temporal dependencies and limited high-quality data. While deep graph-based models demonstrate strong predictive power, their performance often comes at the cost of high computational overhead and substantial training data requirements, making them impractical for deployment in resource-constrained or data-scarce environments. We propose the FlowDistill, a lightweight and scalable traffic prediction framework based on knowledge distillation from large language models (LLMs). In this teacher-student setup, a fine-tuned LLM guides a compact multi-layer perceptron (MLP) student model using a novel combination of the information bottleneck principle and teacher-bounded regression loss, ensuring the distilled model retains only essential and transferable knowledge. Spatial and temporal correlations are explicitly encoded to enhance the model's generalization across diverse urban settings. Despite its simplicity, FlowDistill consistently outperforms state-of-the-art models in prediction accuracy while requiring significantly less training data, and achieving lower memory usage and inference latency, highlighting its efficiency and suitability for real-world, scalable deployment.
Problem

Research questions and friction points this paper is trying to address.

Traffic flow prediction faces complex spatio-temporal dependencies and data scarcity
Deep graph models have high computational costs and data requirements
Lightweight scalable solution needed for resource-constrained environments
Innovation

Methods, ideas, or system contributions that make the work stand out.

Distills traffic prediction from LLMs to MLP
Uses information bottleneck and teacher-bounded loss
Encodes spatio-temporal correlations for generalization
Chenyang Yu
Chenyang Yu
Dalian University of Technology
Deep learning,person reidentification
Xinpeng Xie
Xinpeng Xie
Department of Computer Science and Engineering, University of North Texas
Y
Yan Huang
Department of Computer Science and Engineering, University of North Texas
C
Chenxi Qiu
Department of Computer Science and Engineering, University of North Texas