FlowDistill: Scalable Traffic Flow Prediction via Distillation from LLMs

📅 2025-04-02

📈 Citations: 0

✨ Influential: 0

career value

154K/year

🤖 AI Summary

Traffic flow forecasting remains challenging in resource-constrained cities due to complex spatiotemporal dependencies and scarcity of high-quality labeled data. This paper proposes FlowDistill, a lightweight and scalable framework that introduces the first LLM-based knowledge distillation paradigm for traffic forecasting: a fine-tuned large language model serves as the teacher to guide a compact MLP student. By integrating the information bottleneck principle with teacher-constrained regression loss, FlowDistill explicitly models spatiotemporal correlations to improve cross-city generalization. Compared to state-of-the-art methods, FlowDistill reduces training data requirements by 63%, improves prediction accuracy by 2.1%, decreases memory footprint by 42%, and cuts inference latency by 68%, enabling real-time deployment on edge devices. Its core innovation lies in the synergistic co-design of LLM-driven knowledge distillation and traffic-aware lightweight spatiotemporal modeling.

Technology Category

Application Category

📝 Abstract

Accurate traffic flow prediction is vital for optimizing urban mobility, yet it remains difficult in many cities due to complex spatio-temporal dependencies and limited high-quality data. While deep graph-based models demonstrate strong predictive power, their performance often comes at the cost of high computational overhead and substantial training data requirements, making them impractical for deployment in resource-constrained or data-scarce environments. We propose the FlowDistill, a lightweight and scalable traffic prediction framework based on knowledge distillation from large language models (LLMs). In this teacher-student setup, a fine-tuned LLM guides a compact multi-layer perceptron (MLP) student model using a novel combination of the information bottleneck principle and teacher-bounded regression loss, ensuring the distilled model retains only essential and transferable knowledge. Spatial and temporal correlations are explicitly encoded to enhance the model's generalization across diverse urban settings. Despite its simplicity, FlowDistill consistently outperforms state-of-the-art models in prediction accuracy while requiring significantly less training data, and achieving lower memory usage and inference latency, highlighting its efficiency and suitability for real-world, scalable deployment.

Problem

Research questions and friction points this paper is trying to address.

Traffic flow prediction faces complex spatio-temporal dependencies and data scarcity

Deep graph models have high computational costs and data requirements

Lightweight scalable solution needed for resource-constrained environments

Innovation

Methods, ideas, or system contributions that make the work stand out.

Distills traffic prediction from LLMs to MLP

Uses information bottleneck and teacher-bounded loss

Encodes spatio-temporal correlations for generalization

🔎 Similar Papers

Large Language Models for Mobility Analysis in Transportation Systems: A Survey on Forecasting Tasks