๐ค AI Summary
This work addresses the inefficiency and misalignment challenges of conventional beam training in near-field ultra-massive MIMO systems, where spherical wavefronts render traditional far-field assumptions invalid in complex three-dimensional low-altitude environments. To overcome these limitations, the paper introduces, for the first time, a multimodal large language model (MLLM) that integrates historical GPS data, RGB images, LiDAR point clouds, and task-oriented textual prompts. By leveraging structure-aware fusion of multi-source information and the reasoning and generalization capabilities of large language models, the proposed approach explicitly models the near-field spherical wave propagation characteristics. This paradigm transcends the constraints of conventional codebook-based beamforming in the joint angleโdistance domain, significantly enhancing beam prediction accuracy and environmental understanding while substantially reducing beam training overhead in intricate 3D low-altitude scenarios.
๐ Abstract
In near-field extremely large-scale multiple-input multiple-output (XL-MIMO) systems, spherical wavefront propagation expands the traditional beam codebook into the joint angular-distance domain, rendering conventional beam training prohibitively inefficient, especially in complex 3-dimensional (3D) low-altitude environments. Furthermore, since near-field beam variations are deeply coupled not only with user positions but also with the physical surroundings, precise beam alignment demands profound environmental understanding capabilities. To address this, we propose a large language model (LLM)-driven multimodal framework that fuses historical GPS data, RGB image, LiDAR data, and strategically designed task-specific textual prompts. By utilizing the powerful emergent reasoning and generalization capabilities of the LLM, our approach learns complex spatial dynamics to achieve superior environmental comprehension...