Provable Sparse Inversion and Token Relabel Enhanced One-shot Federated Learning with ViTs

📅 2026-05-11

📈 Citations: 0

✨ Influential: 0

career value

149K/year

🤖 AI Summary

This work addresses the semantic-label misalignment in synthetic data generated by data-free methods under extreme non-IID settings in single-round federated learning. To tackle this issue, the authors propose FedMITR, a novel framework that integrates sparse model inversion to produce synthetic images emphasizing semantically salient foreground content. FedMITR further introduces, for the first time, an information-density-based differential token relabeling strategy to refine pseudo-labels of image patches in Vision Transformers. Theoretical analysis demonstrates that this design reduces gradient variance and instability, yielding a tighter generalization bound. Extensive experiments across various extreme non-IID scenarios show that FedMITR significantly outperforms existing baselines, confirming its effectiveness and robustness.

📝 Abstract

One-Shot Federated Learning, where a central server learns a global model in a single communication round, has emerged as a promising paradigm. However, under extremely non-IID settings, existing data-free methods often generate low-quality data that suffers from severe semantic misalignment with ground-truth labels. To overcome these issues, we propose a novel Federated Model Inversion and Token Relabel (FedMITR) framework, which trains the global model by fully exploiting all patches of synthetic images. Specifically, FedMITR employs sparse model inversion during data generation, selectively inverting semantic foregrounds while halting the inversion of uninformative backgrounds. To address semantically meaningless tokens that hinder ViT predictions, we implement a differentiated strategy: patches with high information density utilize generated pseudo-labels, while patches with low information density are relabeled via ensemble models for robust distillation. Theoretically, our analysis based on algorithmic stability reveals that Sparse Model Inversion eliminates gradient instability arising from background noise, while Token Relabel effectively reduces gradient variance, collectively guaranteeing a tighter generalization bound. Empirically, extensive experimental results demonstrate that FedMITR substantially outperforms existing baselines under various settings.

Problem

Research questions and friction points this paper is trying to address.

One-Shot Federated Learning

non-IID

data-free

semantic misalignment

synthetic data

Innovation

Methods, ideas, or system contributions that make the work stand out.

Sparse Model Inversion

Token Relabel

One-Shot Federated Learning