EdgeFlowerTune: Evaluating Federated LLM Fine-Tuning Under Realistic Edge System Constraints

📅 2026-05-08
📈 Citations: 0
Influential: 0
📄 PDF

career value

228K/year
🤖 AI Summary
Existing research on federated fine-tuning of large language models (LLMs) predominantly relies on simulated environments, often overlooking the resource and runtime constraints of real-world edge devices, thereby limiting the assessment of practical deployment feasibility. This work proposes the first benchmark for federated LLM fine-tuning tailored to real edge systems, building an end-to-end platform based on Flower and MobileFineTuner that spans commercial Android smartphones and NVIDIA edge development boards. The framework introduces three unified evaluation protocols—Quality-under-Budget, Cost-to-Target, and Robustness—to jointly assess model performance, system overhead (including communication, latency, memory, and energy consumption), and resilience in dynamic environments. Experiments demonstrate that accuracy-centric evaluations can mislead method selection, as accounting for system constraints reveals significant differences in deployability across approaches, establishing a reproducible, system-aware evaluation standard for edge-based federated learning.
📝 Abstract
Federated fine-tuning offers a promising paradigm for adapting large language models (LLMs) on edge devices by leveraging the rich, diverse, and continuously generated data from smartphones and IoT devices without compromising user data privacy. Such edge-side adaptation can improve model personalization, robustness, and responsiveness to local contexts. However, the practical feasibility of federated LLM fine-tuning on real edge devices remains unclear, as most existing work focuses on cross-silo or simulation-based settings, overlooking the resource and runtime constraints that determine whether a method is deployable on real edge systems. We present EdgeFlowerTune, a deployment-oriented benchmark for federated LLM fine-tuning under realistic edge-system constraints. EdgeFlowerTune jointly evaluates model quality and system costs, including communication, wall-clock latency, memory usage, energy consumption, and robustness to dynamic edge conditions. To compare methods in terms of effectiveness, efficiency, and robustness, EdgeFlowerTune introduces three complementary protocols: Quality-under-Budget, Cost-to-Target, and Robustness. We instantiate EdgeFlowerTune as a real-device platform built on Flower and MobileFineTuner, spanning commercial Android smartphones and NVIDIA edge development boards. Our benchmark results show that accuracy-only evaluation can lead to misleading conclusions: methods with similar final quality may differ substantially in deployability once realistic system constraints are considered. EdgeFlowerTune provides a reproducible benchmark for system-aware evaluation of federated LLM fine-tuning at the edge.
Problem

Research questions and friction points this paper is trying to address.

federated learning
large language models
edge computing
system constraints
fine-tuning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Federated LLM fine-tuning
Edge computing
System-aware benchmarking
Resource constraints
Deployability
🔎 Similar Papers
No similar papers found.