🤖 AI Summary
This study addresses the challenges of heterogeneous self-interested charging demands and uncertain arrival-departure times in decentralized vehicle-to-vehicle (V2V) energy trading. To tackle these issues, the authors propose the Nash-MADDPG algorithm, which uniquely integrates the Nash bargaining solution into a multi-agent deep deterministic policy gradient framework. By employing bilateral dynamic pricing and a bargaining-oriented price-proximity reward mechanism, the method guides agents to learn incentive-compatible trading strategies without requiring central coordination. The approach guarantees convergence to the bargaining-optimal solution and demonstrates superior performance in a 30-day simulation, achieving 61.6% higher social welfare, 62.9% greater trading volume, and a 40.1% improvement in the Jain fairness index compared to double auction mechanisms. Scalability and price stability are further validated across agent populations ranging from 6 to 100.
📝 Abstract
Vehicle-to-vehicle (V2V) energy trading enables decentralized peer-to-peer energy exchange among electric vehicles (EVs), reducing grid dependency while monetizing surplus capacity. However, coordinating self-interested EV agents with diverse charging needs and uncertain arrival-departure schedules remains challenging. Existing approaches either require centralized optimization with computational limitations or lack fairness guarantees. This paper integrates Nash Bargaining Solution into Multi-Agent Deep Deterministic Policy Gradient, namely Nash-MADDPG, for incentive-aligned V2V energy trading. Nash bargaining determines efficient bilateral pricing, while Nash-guided price proximity rewards align agent learning toward bargaining-optimal strategies. Evaluation over 30-day continuous operation demonstrates an improvement of 61.6% in social welfare and 62.9% improvement in trading volume over Double Auction, while achieving superior fairness, such as 40.1% improvement in Jain's index. Testing across 6-100 agents over a 30-day horizon with continuous vehicle turnover confirms scalability across population size and empirically stable pricing near the Nash Bargaining benchmark.