PReD: An LLM-based Foundation Multimodal Model for Electromagnetic Perception, Recognition, and Decision

📅 2026-03-30
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the long-standing challenges in the electromagnetic domain—namely, data scarcity and insufficient integration of domain knowledge—and introduces PReD, the first multimodal foundation model tailored for electromagnetic signals. To support end-to-end learning from raw waveforms to language-driven reasoning, the authors construct PReD-1.3M, a high-quality multitask dataset encompassing time-domain, frequency-domain, and constellation diagram representations, along with PReD-Bench, a comprehensive evaluation benchmark. Leveraging a large language model-based, multi-stage unified training strategy, PReD achieves state-of-the-art performance across diverse tasks including signal detection, modulation recognition, parameter estimation, protocol identification, RF fingerprinting, and anti-jamming decision-making. These results demonstrate the feasibility and promise of vision-aligned foundation models for advancing electromagnetic intelligence.
📝 Abstract
Multimodal Large Language Models have demonstrated powerful cross-modal understanding and reasoning capabilities in general domains. However, in the electromagnetic (EM) domain, they still face challenges such as data scarcity and insufficient integration of domain knowledge. This paper proposes PReD, the first foundation model for the EM domain that covers the intelligent closed-loop of "perception, recognition, decision-making." We constructed a high-quality multitask EM dataset, PReD-1.3M, and an evaluation benchmark, PReD-Bench. The dataset encompasses multi-perspective representations such as raw time-domain waveform, frequency-domain spectrograms, and constellation diagrams, covering typical features of communication and radar signals. It supports a range of core tasks, including signal detection, modulation recognition, parameter estimation, protocol recognition, radio frequency fingerprint recognition, and anti-jamming decision-making. PReD adopts a multi-stage training strategy that unifies multiple tasks for EM signals. It achieves closed-loop optimization from end-to-end signal understanding to language-driven reasoning and decision-making, significantly enhancing EM domain expertise while maintaining general multimodal capabilities. Experimental results show that PReD achieves state-of-the-art performance on PReD-Bench constructed from both open-source and self-collected signal datasets. These results collectively validate the feasibility and potential of vision-aligned foundation models in advancing the understanding and reasoning of EM signals.
Problem

Research questions and friction points this paper is trying to address.

electromagnetic perception
multimodal large language models
data scarcity
domain knowledge integration
signal understanding
Innovation

Methods, ideas, or system contributions that make the work stand out.

foundation model
electromagnetic perception
multimodal LLM
signal understanding
closed-loop decision-making
🔎 Similar Papers
No similar papers found.
Z
Zehua Han
J
Jing Xiao
Y
Yiqi Duan
M
Mengyu Xiang
Yuheng Ji
Yuheng Ji
Institute of Automation, Chinese Academy of Sciences
Embodied AIComputer Vision
X
Xiaolong Zheng
C
Chenghanyu Zhang
Z
Zhendong She
J
Junyu Shen
D
Dingwei Tan
S
Shichu Sun
Z
Zhou Cong
M
Mingxuan Liu
Fengxiang Wang
Fengxiang Wang
National University of Defense Technology
Computer VisionRemote Sensing
J
Jinping Sun
Y
Yangang Sun