🤖 AI Summary
This work addresses the challenge of beam prediction in millimeter-wave communications for highly mobile unmanned aerial vehicle (UAV) scenarios, where high-frequency channel characteristics complicate reliable forecasting. To tackle this, the authors propose an embodied intelligence–oriented multi-agent collaborative reasoning framework that decomposes the prediction task into three stages: analysis, planning, and evaluation. A hybrid model system with dynamic dataflow switching is designed, integrating Mamba-based temporal modeling, convolutional visual encoding, and a cross-attention multimodal fusion mechanism. This architecture effectively overcomes the limitations of large language models in context length and controllability. Evaluated on a real-world UAV millimeter-wave dataset, the proposed method achieves a Top-1 prediction accuracy of 96.57%, demonstrating significant improvements in both prediction accuracy and robustness.
📝 Abstract
Millimeter-wave or terahertz communications can meet demands of low-altitude economy networks for high-throughput sensing and real-time decision making. However, high-frequency characteristics of wireless channels result in severe propagation loss and strong beam directivity, which make beam prediction challenging in highly mobile uncrewed aerial vehicles (UAV) scenarios. In this paper, we employ agentic AI to enable the transformation of mmWave base stations toward embodied intelligence. We innovatively design a multi-agent collaborative reasoning architecture for UAV-to-ground mmWave communications and propose a hybrid beam prediction model system based on bimodal data. The multi-agent architecture is designed to overcome the limited context window and weak controllability of large language model (LLM)-based reasoning by decomposing beam prediction into task analysis, solution planning, and completeness assessment. To align with the agentic reasoning process, a hybrid beam prediction model system is developed to process multimodal UAV data, including numeric mobility information and visual observations. The proposed hybrid model system integrates Mamba-based temporal modelling, convolutional visual encoding, and cross-attention-based multimodal fusion, and dynamically switches data-flow strategies under multi-agent guidance. Extensive simulations on a real UAV mmWave communication dataset demonstrate that proposed architecture and system achieve high prediction accuracy and robustness under diverse data conditions, with maximum top-1 accuracy reaching 96.57%.