Lightweight Temporal Transformer Decomposition for Federated Autonomous Driving

📅 2025-06-30

📈 Citations: 0

✨ Influential: 0

career value

201K/year

🤖 AI Summary

Single-frame visual inputs exhibit poor robustness in complex scenes, while state-of-the-art temporal models suffer from excessive computational overhead, hindering their deployment in federated learning (FL) settings. Method: We propose a lightweight Temporal Transformer decomposition framework that factorizes the global attention matrix into low-rank components, drastically reducing parameter count and computational complexity. We further design an FL-aware distributed training strategy enabling efficient parameter aggregation and real-time inference. The model jointly processes multi-frame images and steering sequences to achieve accurate temporal modeling and cross-modal feature fusion under resource constraints. Results: Our approach outperforms existing SOTA methods on three benchmark datasets, achieving significant accuracy gains and inference latency below 30 ms. Extensive real-world robotic experiments validate its practical deployability and strong generalization capability in heterogeneous edge environments.

Technology Category

Application Category

📝 Abstract

Traditional vision-based autonomous driving systems often face difficulties in navigating complex environments when relying solely on single-image inputs. To overcome this limitation, incorporating temporal data such as past image frames or steering sequences, has proven effective in enhancing robustness and adaptability in challenging scenarios. While previous high-performance methods exist, they often rely on resource-intensive fusion networks, making them impractical for training and unsuitable for federated learning. To address these challenges, we propose lightweight temporal transformer decomposition, a method that processes sequential image frames and temporal steering data by breaking down large attention maps into smaller matrices. This approach reduces model complexity, enabling efficient weight updates for convergence and real-time predictions while leveraging temporal information to enhance autonomous driving performance. Intensive experiments on three datasets demonstrate that our method outperforms recent approaches by a clear margin while achieving real-time performance. Additionally, real robot experiments further confirm the effectiveness of our method.

Problem

Research questions and friction points this paper is trying to address.

Enhance autonomous driving in complex environments using temporal data

Reduce resource-intensive fusion networks for federated learning compatibility

Achieve real-time performance with lightweight temporal transformer decomposition

Innovation

Methods, ideas, or system contributions that make the work stand out.

Lightweight temporal transformer decomposition method

Breaks large attention maps into smaller matrices

Enables efficient federated learning and real-time predictions

🔎 Similar Papers

pFedLVM: A Large Vision Model (LVM)-Driven and Latent Feature-Based Personalized Federated Learning Framework in Autonomous Driving