EdgeFlowerTune: Evaluating Federated LLM Fine-Tuning Under Realistic Edge System Constraints

📅 2026-05-08

📈 Citations: 0

✨ Influential: 0

career value

204K/year

🤖 AI Summary

Existing research on federated fine-tuning of large language models (LLMs) predominantly relies on simulated environments, often overlooking the resource and runtime constraints of real-world edge devices, thereby limiting the assessment of practical deployment feasibility. This work proposes the first benchmark for federated LLM fine-tuning tailored to real edge systems, building an end-to-end platform based on Flower and MobileFineTuner that spans commercial Android smartphones and NVIDIA edge development boards. The framework introduces three unified evaluation protocols—Quality-under-Budget, Cost-to-Target, and Robustness—to jointly assess model performance, system overhead (including communication, latency, memory, and energy consumption), and resilience in dynamic environments. Experiments demonstrate that accuracy-centric evaluations can mislead method selection, as accounting for system constraints reveals significant differences in deployability across approaches, establishing a reproducible, system-aware evaluation standard for edge-based federated learning.

📝 Abstract

Federated fine-tuning offers a promising paradigm for adapting large language models (LLMs) on edge devices by leveraging the rich, diverse, and continuously generated data from smartphones and IoT devices without compromising user data privacy. Such edge-side adaptation can improve model personalization, robustness, and responsiveness to local contexts. However, the practical feasibility of federated LLM fine-tuning on real edge devices remains unclear, as most existing work focuses on cross-silo or simulation-based settings, overlooking the resource and runtime constraints that determine whether a method is deployable on real edge systems. We present EdgeFlowerTune, a deployment-oriented benchmark for federated LLM fine-tuning under realistic edge-system constraints. EdgeFlowerTune jointly evaluates model quality and system costs, including communication, wall-clock latency, memory usage, energy consumption, and robustness to dynamic edge conditions. To compare methods in terms of effectiveness, efficiency, and robustness, EdgeFlowerTune introduces three complementary protocols: Quality-under-Budget, Cost-to-Target, and Robustness. We instantiate EdgeFlowerTune as a real-device platform built on Flower and MobileFineTuner, spanning commercial Android smartphones and NVIDIA edge development boards. Our benchmark results show that accuracy-only evaluation can lead to misleading conclusions: methods with similar final quality may differ substantially in deployability once realistic system constraints are considered. EdgeFlowerTune provides a reproducible benchmark for system-aware evaluation of federated LLM fine-tuning at the edge.

Problem

Research questions and friction points this paper is trying to address.

federated learning

large language models

edge computing

system constraints

fine-tuning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Federated LLM fine-tuning

Edge computing

System-aware benchmarking