Federated Cross-Modal Retrieval with Missing Modalities via Semantic Routing and Adapter Personalization

📅 2026-04-24
📈 Citations: 0
Influential: 0
📄 PDF

career value

227K/year
🤖 AI Summary
This work addresses the challenge in federated cross-modal retrieval where client data heterogeneity—manifested as semantic non-IID distributions and missing modalities—hinders the global model’s ability to balance shared knowledge and local characteristics. To this end, the authors propose the RCSR framework, which leverages a frozen CLIP backbone augmented with a lightweight shared adapter for effective global knowledge transfer. A prototype anchoring mechanism is introduced to align unimodal clients with the global cross-modal semantic space. Furthermore, the framework innovatively incorporates a retrieval-centric semantic routing scheme and a dynamic aggregation strategy based on retrieval consistency, which adaptively assigns client weights to mitigate alignment drift. Experiments demonstrate that RCSR significantly improves global retrieval performance and training stability on MS-COCO and Flickr30K, while notably enhancing personalized retrieval capabilities for clients with missing modalities.

Technology Category

Application Category

📝 Abstract
Federated cross-modal retrieval faces severe challenges from heterogeneous client data, particularly non-IID semantic distributions and missing modalities. Under such heterogeneity, a single global model is often insufficient to capture both shared cross-modal knowledge and client-specific characteristics. We propose RCSR, a personalization-friendly federated framework that integrates prototype anchoring, retrieval-centric semantic routing, and optional client-specific adapters. Built on a frozen CLIP backbone, RCSR leverages lightweight shared adapters for global knowledge transfer while supporting efficient local personalization. Prototype anchoring helps unimodal clients align with global cross-modal semantics, and a server-side semantic router adaptively assigns aggregation weights based on retrieval consistency to mitigate alignment drift during heterogeneous updates. Extensive experiments on MS-COCO, Flickr30K, and other benchmarks show that RCSR consistently improves global retrieval accuracy and training stability, while further enhancing client-level retrieval performance, especially for clients with incomplete modalities. Code is available at https://github.com/RezinChow/RCSR-Retrieval-Centric-Semantic-Routing.
Problem

Research questions and friction points this paper is trying to address.

Federated Learning
Cross-Modal Retrieval
Missing Modalities
Data Heterogeneity
Non-IID
Innovation

Methods, ideas, or system contributions that make the work stand out.

Federated Learning
Cross-Modal Retrieval
Missing Modalities
Semantic Routing
Adapter Personalization