Structure-Aware Multimodal LLM Framework for Trustworthy Near-Field Beam Prediction

๐Ÿ“… 2026-03-17
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This work addresses the inefficiency and misalignment challenges of conventional beam training in near-field ultra-massive MIMO systems, where spherical wavefronts render traditional far-field assumptions invalid in complex three-dimensional low-altitude environments. To overcome these limitations, the paper introduces, for the first time, a multimodal large language model (MLLM) that integrates historical GPS data, RGB images, LiDAR point clouds, and task-oriented textual prompts. By leveraging structure-aware fusion of multi-source information and the reasoning and generalization capabilities of large language models, the proposed approach explicitly models the near-field spherical wave propagation characteristics. This paradigm transcends the constraints of conventional codebook-based beamforming in the joint angleโ€“distance domain, significantly enhancing beam prediction accuracy and environmental understanding while substantially reducing beam training overhead in intricate 3D low-altitude scenarios.

Technology Category

Application Category

๐Ÿ“ Abstract
In near-field extremely large-scale multiple-input multiple-output (XL-MIMO) systems, spherical wavefront propagation expands the traditional beam codebook into the joint angular-distance domain, rendering conventional beam training prohibitively inefficient, especially in complex 3-dimensional (3D) low-altitude environments. Furthermore, since near-field beam variations are deeply coupled not only with user positions but also with the physical surroundings, precise beam alignment demands profound environmental understanding capabilities. To address this, we propose a large language model (LLM)-driven multimodal framework that fuses historical GPS data, RGB image, LiDAR data, and strategically designed task-specific textual prompts. By utilizing the powerful emergent reasoning and generalization capabilities of the LLM, our approach learns complex spatial dynamics to achieve superior environmental comprehension...
Problem

Research questions and friction points this paper is trying to address.

near-field XL-MIMO
beam prediction
environmental understanding
spherical wavefront
3D low-altitude environment
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multimodal LLM
Near-field XL-MIMO
Beam Prediction
Structure-Aware Fusion
Environmental Understanding
๐Ÿ”Ž Similar Papers
No similar papers found.
Mengyuan Li
Mengyuan Li
University of Southern California
Hardware SecurityTrusted Execution EnvironmentCloud computing
Q
Qianfan Lu
School of Information Science and Engineering, Southeast University, Nanjing 210096, China
J
Jiachen Tian
School of Information Science and Engineering, Southeast University, Nanjing 210096, China
H
Hongjun Hu
School of Information Science and Engineering, Southeast University, Nanjing 210096, China
Yu Han
Yu Han
Southeast University
Traffic controlTraffic flow theoryModel predictive controlReinforcement learning
Xiao Li
Xiao Li
Southeast University
MIMORISintelligent communications
C
Chao-kai Wen
Institute of Communications Engineering, National Sun Yat-sen University, Kaohsiung 804, Taiwan
Shi Jin
Shi Jin
Southeast University
Wireless CommunicationsMIMO5G Technologies