🤖 AI Summary
Coordinating large-scale electric vehicle (EV) fleets in vehicle-to-grid (V2G) systems while simultaneously ensuring data privacy and achieving global optimization remains a challenging distributed control problem. Method: This paper proposes a digital twin–enabled multi-agent reinforcement learning (MARL) framework. Specifically, it introduces the first digital twin–assisted deep deterministic policy gradient (DDPG) algorithm, where a collaborative global prediction model augments the centralized critic, enabling policy coordination using only high-level aggregated information—without accessing raw individual data. The framework integrates lightweight digital twin modeling, decentralized data sharing, and simulation-driven training. Contribution/Results: Experimental results demonstrate that the proposed method achieves control performance comparable to standard MADDPG in simulation, while significantly enhancing data privacy and system decentralization.
📝 Abstract
The coordination of large-scale, decentralised systems, such as a fleet of Electric Vehicles (EVs) in a Vehicle-to-Grid (V2G) network, presents a significant challenge for modern control systems. While collaborative Digital Twins have been proposed as a solution to manage such systems without compromising the privacy of individual agents, deriving globally optimal control policies from the high-level information they share remains an open problem. This paper introduces Digital Twin Assisted Multi-Agent Deep Deterministic Policy Gradient (DT-MADDPG) algorithm, a novel hybrid architecture that integrates a multi-agent reinforcement learning framework with a collaborative DT network. Our core contribution is a simulation-assisted learning algorithm where the centralised critic is enhanced by a predictive global model that is collaboratively built from the privacy-preserving data shared by individual DTs. This approach removes the need for collecting sensitive raw data at a centralised entity, a requirement of traditional multi-agent learning algorithms. Experimental results in a simulated V2G environment demonstrate that DT-MADDPG can achieve coordination performance comparable to the standard MADDPG algorithm while offering significant advantages in terms of data privacy and architectural decentralisation. This work presents a practical and robust framework for deploying intelligent, learning-based coordination in complex, real-world cyber-physical systems.