🤖 AI Summary
This work addresses the joint optimization of federated training and inference on resource-constrained edge devices by formulating the problem as a multi-objective Markov decision process for the first time, explicitly accounting for inference accuracy—enhanced through data and model freshness—as well as latency and energy consumption. To tackle this challenge, the authors propose Constrained Multi-Objective Proximal Policy Optimization (C-MOPPO), which employs a sequential queuing mechanism to couple inference requests with training data and jointly optimizes device mode selection, communication, and computational resource allocation. Experimental results demonstrate that C-MOPPO consistently outperforms baseline methods across diverse system configurations, efficiently generating high-quality, dense Pareto-optimal solution sets that effectively balance the three competing objectives.
📝 Abstract
Federated edge learning (FEEL) has recently emerged as a promising paradigm for achieving edge intelligence (EI) via enabling collaborative model training across edge devices while protecting data privacy. In this paper, we put forth an online optimization framework that jointly manages federated training and inference on resource-constrained edge devices. We introduce a tandem-queue-inspired conversion mechanism that bridges inference requests and training data, and further incorporate both data and model freshness into the accuracy formulation to capture temporal dynamics in real-world environments. To maximize inference accuracy while minimizing latency and energy consumption, the mode selections, communication, and computation resource allocations of edge devices are jointly optimized. We formulate this optimization as a multi-objective optimization problem, which is NP-hard and further complicated by the online setting. To address these challenges, we transform the problem into a multi-objective Markov decision process (MOMDP) and develop a \underline{c}onstrained \underline{m}ulti-\underline{o}bjective \underline{p}roximal \underline{p}olicy \underline{o}ptimization (C-MOPPO) algorithm. Specifically, C-MOPPO first learns a set of policies with different preferences across three objectives, then leverages constrained policy optimization to enrich the Pareto front and obtain high-quality, dense solutions. Extensive experiments demonstrate that C-MOPPO achieves well-balanced trade-offs among objectives and significantly outperforms baselines under various system configurations.