PanLUNA: An Efficient and Robust Query-Unified Multimodal Model for Edge Biosignal Intelligence

📅 2026-04-05

📈 Citations: 0

✨ Influential: 0

career value

241K/year

🤖 AI Summary

This work addresses the limitations of existing foundation models for physiological signals, which are often confined to single modalities and struggle with efficient, robust multimodal fusion and edge deployment under data-scarce conditions. The authors propose a lightweight multimodal foundation model with only 5.4 million parameters that enables early cross-modal fusion of EEG, ECG, and PPG through a shared encoder augmented with sensor-type embeddings and a unified query mechanism, offering robustness to missing modalities during inference. Integrated with a channel-unification module, query-set augmentation, INT8 quantization-aware training, and RISC-V deployment optimizations, the model achieves a balanced accuracy of 81.21% on TUAB abnormal EEG detection and a state-of-the-art performance of 0.7416 on HMC sleep staging, while enabling real-time inference on a GAP9 chip with only 18.8 mJ energy consumption and 325.6 ms latency.

Technology Category

Application Category

📝 Abstract

Physiological foundation models (FMs) have shown promise for biosignal representation learning, yet most remain confined to a single modality such as EEG, ECG, or PPG, largely because paired multimodal datasets are scarce. In this paper, we present PanLUNA, a compact 5.4M-parameter pan-modal FM that jointly processes EEG, ECG, and PPG within a single shared encoder. Extending LUNA's channel-unification module, PanLUNA treats multimodal channels as entries in a unified query set augmented with sensor-type embeddings, enabling efficient cross-modal early fusion while remaining inherently robust to missing modalities at inference time. Despite its small footprint, PanLUNA matches or exceeds models up to 57$\times$ larger: 81.21% balanced accuracy on TUAB abnormal EEG detection and state-of-the-art 0.7416 balanced accuracy on HMC multimodal sleep staging. Quantization-aware training with INT8 weights recovers $\geq$96% of full-precision performance, and deployment on the GAP9 ultra-low-power RISC-V microcontroller for wearables achieves 325.6 ms latency and 18.8 mJ per 10-second, 12-lead ECG inference, and 1.206 s latency at 68.65 mJ for multimodal 5-channel sleep staging over 30-second epochs.

Problem

Research questions and friction points this paper is trying to address.

multimodal biosignal

physiological foundation model

missing modalities

edge intelligence

cross-modal fusion

Innovation

Methods, ideas, or system contributions that make the work stand out.

query-unified multimodal fusion

physiological foundation model

modality-robust inference