M$^2$FedAQI: Multimodal Federated Learning for Air Quality Prediction on Heterogeneous Edge Devices

📅 2026-05-10

📈 Citations: 0

✨ Influential: 0

career value

202K/year

🤖 AI Summary

This study addresses the limitations of existing air quality prediction methods, which predominantly rely on centralized learning and struggle to balance privacy preservation, communication overhead, and multimodal data fusion in distributed IoT environments. To overcome these challenges, this work proposes a lightweight multimodal federated learning framework for decentralized air quality index prediction across heterogeneous edge devices. It introduces, for the first time, a lightweight multimodal fusion mechanism into federated learning, enabling efficient cross-modal interaction between visual and tabular data through feature modulation, while incorporating TLS authentication to ensure secure communication. Evaluated on the PM25Vision and TRAQID datasets, the proposed model significantly outperforms baseline approaches, achieving an 11.0% higher accuracy, a 12.2% improvement in F1-score, an 18.0% increase in R², and reductions of 25.4% and 20.4% in MAE and RMSE, respectively, all while adhering to edge device resource constraints.

📝 Abstract

Accurate air quality prediction is essential for public health, environmental monitoring, and industrial safety. However, most existing approaches rely on centralized learning paradigms, which introduce challenges related to scalability, privacy preservation, and communication overhead in distributed Internet of Things (IoT) environments. Moreover, current federated learning (FL) based solutions predominantly utilize unimodal data, limiting their capability to capture complex environmental patterns. To address these limitations, we propose M$^2$FedAQI, a lightweight multimodal federated framework for decentralized Air Quality Index (AQI) prediction across heterogeneous edge devices. The proposed framework integrates visual and tabular modalities through a feature modulation based fusion mechanism that enables efficient cross-modal interaction while maintaining low computational overhead. M$^2$FedAQI is evaluated on two benchmark datasets, PM25Vision and TRAQID, for both classification and regression tasks under centralized and federated settings. Experimental results demonstrate that M$^2$FedAQI consistently outperforms existing approaches, achieving improvements of up to 11.0\% in Accuracy, 3.53\% in AUC, 12.2\% in F1-score, and 18.0\% in $R^2$, while reducing MAE and RMSE by up to 25.4\% and 20.4\%, respectively, compared with the strongest baselines. Furthermore, deployment on heterogeneous edge devices demonstrates efficient resource utilization in terms of communication overhead, memory footprint, and computational cost. To enhance communication security, TLS-based authentication is incorporated to ensure secure client participation and protect the FL communication channel from unauthorized third-party access without modifying the underlying FL protocol.

Problem

Research questions and friction points this paper is trying to address.

air quality prediction

federated learning

multimodal data

heterogeneous edge devices

privacy preservation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multimodal Federated Learning

Feature Modulation Fusion

Edge AI