🤖 AI Summary
Federated learning (FL) lacks a standardized evaluation framework addressing Adaptability, Trustworthiness, and Reasoning—hindering fair algorithm comparison and systematic advancement.
Method: We propose ATR, the first unified three-dimensional evaluation framework for FL, encompassing client heterogeneity adaptation, trust assurance under malicious/unreliable environments, and quantification of model reasoning capability. We design standardized task paradigms, multi-dimensional metrics, heterogeneous simulation environments, and adversarial robustness testing protocols. Furthermore, we introduce a literature-driven reasoning analysis framework to bridge the longstanding gap in FL reasoning evaluation.
Contribution/Results: Leveraging this benchmark, we comprehensively evaluate mainstream FL algorithms across Adaptation and Trust dimensions. We publicly release an extensible codebase and a continuously updated knowledge repository, fostering standardized, reproducible FL evaluation and accelerating community-driven progress.
📝 Abstract
Federated Learning (FL) has emerged as a promising paradigm for collaborative model training while preserving data privacy across decentralized participants. As FL adoption grows, numerous techniques have been proposed to tackle its practical challenges. However, the lack of standardized evaluation across key dimensions hampers systematic progress and fair comparison of FL methods. In this work, we introduce ATR-Bench, a unified framework for analyzing federated learning through three foundational dimensions: Adaptation, Trust, and Reasoning. We provide an in-depth examination of the conceptual foundations, task formulations, and open research challenges associated with each theme. We have extensively benchmarked representative methods and datasets for adaptation to heterogeneous clients and trustworthiness in adversarial or unreliable environments. Due to the lack of reliable metrics and models for reasoning in FL, we only provide literature-driven insights for this dimension. ATR-Bench lays the groundwork for a systematic and holistic evaluation of federated learning with real-world relevance. We will make our complete codebase publicly accessible and a curated repository that continuously tracks new developments and research in the FL literature.