🤖 AI Summary
This work addresses poor scalability and performance instability in unsupervised multi-source federated domain adaptation (UMSFDA), arising from high source-domain heterogeneity. Methodologically, we propose a group-wise discrepancy minimization objective coupled with a temperature-controlled centroid-weighting strategy, enabling approximate all-pairwise domain alignment and dynamic selection of relevant source domains. This supports efficient parallel training over large-scale heterogeneous source domains without raw data sharing. Our approach innovatively unifies unsupervised domain adaptation, federated learning, and group-level alignment. Evaluated on standard benchmarks and the newly constructed Digit-18 benchmark comprising 18 diverse digit datasets, our method achieves significant improvements over state-of-the-art approaches—particularly under high domain diversity—demonstrating superior robustness, computational efficiency, and generalization capability.
📝 Abstract
Unsupervised multi-source domain adaptation (UMDA) aims to learn models that generalize to an unlabeled target domain by leveraging labeled data from multiple, diverse source domains. While distributed UMDA methods address privacy constraints by avoiding raw data sharing, existing approaches typically assume a small number of sources and fail to scale effectively. Increasing the number of heterogeneous domains often makes existing methods impractical, leading to high computational overhead or unstable performance. We propose GALA, a scalable and robust federated UMDA framework that introduces two key components: (1) a novel inter-group discrepancy minimization objective that efficiently approximates full pairwise domain alignment without quadratic computation; and (2) a temperature-controlled, centroid-based weighting strategy that dynamically prioritizes source domains based on alignment with the target. Together, these components enable stable and parallelizable training across large numbers of heterogeneous sources. To evaluate performance in high-diversity scenarios, we introduce Digit-18, a new benchmark comprising 18 digit datasets with varied synthetic and real-world domain shifts. Extensive experiments show that GALA consistently achieves competitive or state-of-the-art results on standard benchmarks and significantly outperforms prior methods in diverse multi-source settings where others fail to converge.