Safe Multi-Agent Deep Reinforcement Learning for Privacy-Aware Edge-Device Collaborative DNN Inference

📅 2026-02-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of jointly optimizing privacy preservation, resource constraints, and dynamic model deployment for deep neural network (DNN) inference on edge devices. The authors propose a privacy-aware collaborative inference framework that formulates the problem as a constrained Markov decision process and introduces a hierarchical constrained multi-agent proximal policy optimization algorithm (HC-MAPPO-L). This algorithm integrates Lagrangian relaxation, autoregressive model deployment, attention-based resource allocation, and dual variable updates. Under strict satisfaction of long-term latency constraints, the method effectively balances energy consumption and privacy cost, consistently outperforming existing baselines across diverse model scales and resource configurations.

Technology Category

Application Category

📝 Abstract
As Deep Neural Network (DNN) inference becomes increasingly prevalent on edge and mobile platforms, critical challenges emerge in privacy protection, resource constraints, and dynamic model deployment. This paper proposes a privacy-aware collaborative inference framework, in which adaptive model partitioning is performed across edge devices and servers. To jointly optimize inference delay, energy consumption, and privacy cost under dynamic service demands and resource constraints, we formulate the joint problem as a Constrained Markov Decision Process (CMDP) that integrates model deployment, user-server association, model partitioning, and resource allocation. We propose a Hierarchical Constrained Multi-Agent Proximal Policy Optimization with Lagrangian relaxation (HC-MAPPO-L) algorithm, a safe reinforcement learning-based framework that enhances Multi-Agent Proximal Policy Optimization (MAPPO) with adaptive Lagrangian dual updates to enforce long-term delay constraints. To ensure tractability while maintaining coordination, we decompose the CMDP into three hierarchically structured policy layers: an auto-regressive based model deployment policy, a Lagrangian-enhanced user association and model partitioning policy, and an attention-based resource allocation policy. Extensive experimental results demonstrate that HC-MAPPO-L consistently satisfies stringent delay constraints while achieving a superior balance among energy consumption and privacy cost, outperforming representative baseline algorithms across varying problem scales and resource configurations.
Problem

Research questions and friction points this paper is trying to address.

privacy-aware
edge-device collaboration
DNN inference
resource constraints
delay constraints
Innovation

Methods, ideas, or system contributions that make the work stand out.

Constrained Markov Decision Process
Hierarchical Multi-Agent Reinforcement Learning
Privacy-Aware Inference
Model Partitioning
Lagrangian Relaxation
🔎 Similar Papers
No similar papers found.
H
Hong Wang
School of Future Science and Engineering, Soochow University, Suzhou 215006, China
X
Xuwei Fan
College of Computer and Information Sciences, Fujian Agriculture and Forestry University, Fuzhou 350002, China
Zhipeng Cheng
Zhipeng Cheng
Soochow University
Edge IntelligenceFederated LearningUAV NetworksService Computing
Yachao Yuan
Yachao Yuan
University of Goettingen
defects detectiondefects segmentation
Minghui Min
Minghui Min
China University of Mining and Technology (CUMT)
Wireless communicationsNetwork SecurityPrivacyDeep learning
M
Minghui Liwang
Department of Control Science and Engineering, Tongji University, Shanghai 201804, China
Xiaoyu Xia
Xiaoyu Xia
School of Computing Technologies, RMIT University
Parallel and Distributed ComputingSystem SecurityEdge ComputingSustainable Computing