Communication-Efficient Multi-Modal Edge Inference via Uncertainty-Aware Distributed Learning

📅 2026-01-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenges of high communication overhead and insufficient robustness to channel fluctuations and noisy inputs in multimodal inference under bandwidth-constrained wireless edge environments. The authors propose a three-stage communication-aware distributed learning framework: first, local multimodal self-supervised pretraining initializes encoders without device–server interaction; second, evidence-based fusion enables distributed fine-tuning; and third, an uncertainty-guided feedback mechanism dynamically balances communication efficiency and inference accuracy. By integrating uncertainty awareness into multimodal edge inference for the first time, the method significantly reduces communication rounds and improves accuracy on RGB–depth indoor scene classification tasks, while demonstrating superior robustness over existing self-supervised and fully supervised approaches under modality dropout or channel perturbations.

Technology Category

Application Category

📝 Abstract
Semantic communication is emerging as a key enabler for distributed edge intelligence due to its capability to convey task-relevant meaning. However, achieving communication-efficient training and robust inference over wireless links remains challenging. This challenge is further exacerbated for multi-modal edge inference (MMEI) by two factors: 1) prohibitive communication overhead for distributed learning over bandwidth-limited wireless links, due to the \emph{multi-modal} nature of the system; and 2) limited robustness under varying channels and noisy multi-modal inputs. In this paper, we propose a three-stage communication-aware distributed learning framework to improve training and inference efficiency while maintaining robustness over wireless channels. In Stage~I, devices perform local multi-modal self-supervised learning to obtain shared and modality-specific encoders without device--server exchange, thereby reducing the communication cost. In Stage~II, distributed fine-tuning with centralized evidential fusion calibrates per-modality uncertainty and reliably aggregates features distorted by noise or channel fading. In Stage~III, an uncertainty-guided feedback mechanism selectively requests additional features for uncertain samples, optimizing the communication--accuracy tradeoff in the distributed setting. Experiments on RGB--depth indoor scene classification show that the proposed framework attains higher accuracy with far fewer training communication rounds and remains robust to modality degradation or channel variation, outperforming existing self-supervised and fully supervised baselines.
Problem

Research questions and friction points this paper is trying to address.

multi-modal edge inference
communication efficiency
wireless channels
robustness
distributed learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uncertainty-Aware Learning
Multi-Modal Edge Inference
Semantic Communication
Distributed Self-Supervised Learning
Evidential Fusion
🔎 Similar Papers
No similar papers found.
H
Hang Zhao
Dept. of Electronic and Computer Engineering, The Hong Kong University of Science and Technology, Hong Kong
Hongru Li
Hongru Li
HKUST
Wireless CommunicationSemantic Communication
Dongfang Xu
Dongfang Xu
The Hong Kong University of Science and Technology
Wireless Communications
Shenghui Song
Shenghui Song
The Hong Kong University of Science and Technology
Information TheoryDistributed IntelligenceML for CommunicationIntegrated Sensing and Communication
K
K. B. Letaief
Dept. of Electronic and Computer Engineering, The Hong Kong University of Science and Technology, Hong Kong