🤖 AI Summary
Beam prediction for millimeter-wave (mmWave) MIMO systems in vehicle-to-infrastructure (V2I) cooperative driving remains challenging due to dynamic channel variations and limited training data.
Method: This paper proposes a novel beam selection framework that fuses multimodal perception data—RGB images, radar point clouds, LiDAR scans, and GPS trajectories. It introduces GPT-2 as the first large language model (LLM) for this task, integrated within a multimodal encoder and cross-modal alignment architecture. The framework supports few-shot generalization, with performance improving as modality diversity increases. End-to-end beam prediction is achieved via supervised fine-tuning.
Contribution/Results: The method significantly outperforms conventional deep learning baselines in both standard and few-shot settings, achieving substantial gains in prediction accuracy and robustness. It establishes a scalable, adaptive paradigm for intelligent beam management in V2I mmWave communications, enabling reliable high-bandwidth links under real-world mobility constraints.
📝 Abstract
This paper introduces a novel neural network framework called M2BeamLLM for beam prediction in millimeter-wave (mmWave) massive multi-input multi-output (mMIMO) communication systems. M2BeamLLM integrates multi-modal sensor data, including images, radar, LiDAR, and GPS, leveraging the powerful reasoning capabilities of large language models (LLMs) such as GPT-2 for beam prediction. By combining sensing data encoding, multimodal alignment and fusion, and supervised fine-tuning (SFT), M2BeamLLM achieves significantly higher beam prediction accuracy and robustness, demonstrably outperforming traditional deep learning (DL) models in both standard and few-shot scenarios. Furthermore, its prediction performance consistently improves with increased diversity in sensing modalities. Our study provides an efficient and intelligent beam prediction solution for vehicle-to-infrastructure (V2I) mmWave communication systems.