MERLIN: Building Low-SNR Robust Multimodal LLMs for Electromagnetic Signals

📅 2026-03-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the scarcity of high-quality multimodal data and systematic evaluation benchmarks for multimodal large language models (MLLMs) in the electromagnetic signal domain, as well as the significant performance degradation of existing methods under low signal-to-noise ratio (SNR) conditions. To this end, the authors construct EM-100k, the first large-scale paired electromagnetic signal–text dataset, and introduce EM-Bench, a comprehensive evaluation benchmark. They further propose MERLIN, a training framework that enhances model generalization in low-SNR environments through signal–semantic alignment and robustness augmentation mechanisms. This study establishes the first native MLLM paradigm tailored to the electromagnetic domain, encompassing data curation, benchmarking, and modeling. Experiments demonstrate that MERLIN achieves state-of-the-art performance on EM-Bench and exhibits exceptional robustness and stability under low-SNR conditions.

Technology Category

Application Category

📝 Abstract
The paradigm of Multimodal Large Language Models (MLLMs) offers a promising blueprint for advancing the electromagnetic (EM) domain. However, prevailing approaches often deviate from the native MLLM paradigm, instead using task-specific or pipelined architectures that lead to fundamental limitations in model performance and generalization. Fully realizing the MLLM potential in EM domain requires overcoming three main challenges: (1) Data. The scarcity of high-quality datasets with paired EM signals and descriptive text annotations used for MLLMs pre-training; (2) Benchmark. The absence of comprehensive benchmarks to systematically evaluate and compare the performance of models on EM signal-to-text tasks; (3) Model. A critical fragility in low Signal-to-Noise Ratio (SNR) environments, where critical signal features can be obscured, leading to significant performance degradation. To address these challenges, we introduce a tripartite contribution to establish a foundation for MLLMs in the EM domain. First, to overcome data scarcity, we construct and release EM-100k, a large-scale dataset comprising over 100,000 EM signal-text pairs. Second, to enable rigorous and standardized evaluation, we propose EM-Bench, the most comprehensive benchmark featuring diverse downstream tasks spanning from perception to reasoning. Finally, to tackle the core modeling challenge, we present MERLIN, a novel training framework designed not only to align low-level signal representations with high-level semantic text, but also to explicitly enhance model robustness and performance in challenging low-SNR environments. Comprehensive experiments validate our method, showing that MERLIN is state-of-the-art in the EM-Bench and exhibits remarkable robustness in low-SNR settings.
Problem

Research questions and friction points this paper is trying to address.

Multimodal Large Language Models
Electromagnetic Signals
Low-SNR Robustness
Dataset Scarcity
Benchmark
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multimodal Large Language Models
Low-SNR Robustness
EM Signal-Text Alignment
EM-100k Dataset
EM-Bench
J
Junyu Shen
Tsinghua University
Z
Zhendong She
Tianjin University
C
Chenghanyu Zhang
Beijing University of Posts and Telecommunications
Y
Yuchuang Sun
Tsinghua University
L
Luqing Luo
Institute of Microelectronics of the Chinese Academy of Sciences
D
Dingwei Tan
HKUST (Guangzhou)
Zonghao Guo
Zonghao Guo
University of Chinese Academy of Sciences
Bo Guo
Bo Guo
Associate Professor, Hydrology and Atmospheric Sciences, University of Arizona
Flow in porous mediaContaminant transportSubsurface hydrologyNumerical methods
Z
Zehua Han
Beihang University
W
Wupeng Xie
Artificial Intelligence Institute of China Electronics Technology Group Corporation
Y
Yaxin Mu
Beijing Information Science and Technology University
Peng Zhang
Peng Zhang
Professor, Tianjin University
Information RetrievalMachine LearningNatural Language Processing
Peipei Li
Peipei Li
Beijing University of Posts and Telecommunications (BUPT)
Computer VisionImage SynthesisFace Recognition
Fengxiang Wang
Fengxiang Wang
National University of Defense Technology
Computer VisionRemote Sensing
Y
Yangang Sun
Tsinghua University
Maosong Sun
Maosong Sun
Professor of Computer Science and Technology, Tsinghua University
Natural Language ProcessingArtificial IntelligenceSocial Computing