🤖 AI Summary
Existing vertical federated learning (VFL) frameworks for privacy-sensitive multimodal classification in edge AI—such as mobile health diagnostics—are hindered by simplistic client-side feature fusion, leading to suboptimal model performance. Method: We propose a novel VFL framework tailored for resource-constrained edge devices. It introduces a lightweight feature disentanglement module on clients to separate modality-specific and shared representations, and a cross-modal Transformer on the server to enable context-aware, privacy-preserving fusion. Contribution/Results: This co-designed mechanism is the first to integrate disentangled representation learning with cross-modal modeling into the VFL paradigm. Evaluated on the HAM10000 multimodal skin lesion dataset, our method significantly outperforms standard VFL baselines, demonstrating its capability to enhance model robustness and generalization while rigorously preserving data privacy.
📝 Abstract
Vertical Federated Learning (VFL) offers a privacy-preserving paradigm for Edge AI scenarios like mobile health diagnostics, where sensitive multimodal data reside on distributed, resource-constrained devices. Yet, standard VFL systems often suffer performance limitations due to simplistic feature fusion. This paper introduces HybridVFL, a novel framework designed to overcome this bottleneck by employing client-side feature disentanglement paired with a server-side cross-modal transformer for context-aware fusion. Through systematic evaluation on the multimodal HAM10000 skin lesion dataset, we demonstrate that HybridVFL significantly outperforms standard federated baselines, validating the criticality of advanced fusion mechanisms in robust, privacy-preserving systems.