BeamVLM for Low-altitude Economy: Generative Beam Prediction via Vision-language Models

📅 2026-02-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limited generalization of existing beam prediction methods, which often lack high-level semantic understanding in dynamic environments. To overcome this, we propose the first integration of vision-language models (VLMs) into beam prediction for highly mobile unmanned aerial vehicles (UAVs) communicating with ground base stations. We formulate beam prediction as a visual question answering task, leveraging instruction prompting and an end-to-end mapping from visual patches to the language domain to jointly model fine-grained spatial semantics and high-level reasoning. Evaluated on real-world datasets, our approach significantly outperforms state-of-the-art methods, demonstrating superior accuracy and enhanced generalization across diverse scenarios such as low-altitude economy applications and vehicle-to-infrastructure (V2I) communication.

Technology Category

Application Category

📝 Abstract
For low-altitude economy (LAE), fast and accurate beam prediction between high-mobility unmanned aerial vehicles (UAVs) and ground base stations is of paramount importance, which ensures seamless coverage and reliable communications. However, existing deep learning-based beam prediction methods lack high-level semantic understanding of dynamic environments, resulting in poor generalization. On the other hand, the emerging large language model (LLM) based approaches show promise in enhancing generalization, but they typically lack rich environmental perception, thereby failing to capture fine-grained spatial semantics essential for precise beam alignment. To tackle these limitations, we propose in this correspondence a novel end-to-end generative framework for beam prediction, called BeamVLM, which treats beam prediction as a vision question answering task capitalizing on powerful existing vision-language models (VLMs). By projecting raw visual patches directly into the language domain and judiciously designing an instructional prompt, the proposed BeamVLM enables the VLM to jointly reason over UAV trajectories and environmental context. Last, experimental results on real-world datasets demonstrate that the proposed BeamVLM outperforms state-of-the-art methods in prediction accuracy and also exhibits superior generalization for other scenarios such as vehicle-to-infrastructure (V2I) beam prediction.
Problem

Research questions and friction points this paper is trying to address.

beam prediction
low-altitude economy
unmanned aerial vehicles
vision-language models
generalization
Innovation

Methods, ideas, or system contributions that make the work stand out.

Vision-Language Model
Generative Beam Prediction
Low-altitude Economy
Visual Question Answering
UAV Communication
🔎 Similar Papers
No similar papers found.
C
Chenran Kou
Department of Electronic and Electrical Engineering, Southern University of Science and Technology, Shenzhen 518055, China
Changsheng You
Changsheng You
Southern University of Science and Technology, Clarivate Highly Cited Researcher
Edge computing and intelligenceintelligent reflecting surfacenear-field communicationsUAV
M
Mingjiang Wu
Department of Electronic and Electrical Engineering, Southern University of Science and Technology, Shenzhen 518055, China
D
Dingzhu Wen
Network Intelligence Center, School of Information Science and Technology, ShanghaiTech University, Shanghai, China
Zezhong Zhang
Zezhong Zhang
Research Assistant Professor, The Chinese University of Hong Kong (Shenzhen)
Wireless CommunicationsMachine LearningFederated LearningMassive MIMO
C
Chengwen Xing
School of Information and Electronics, Beijing Institute of Technology, Beijing 100081, China