🤖 AI Summary
Existing peer-to-peer (P2P) energy trading frameworks lack robust mechanisms to ensure fairness, particularly in multi-agent microgrid markets. Method: This paper proposes a fairness-aware trading agent framework integrating large language models (LLMs) with reinforcement learning (RL). Specifically, an LLM serves as a scalable, interpretable real-time fairness evaluator—replacing inflexible rule-based criteria—and a dynamic reward shaping mechanism, governed by a λ-coefficient scheduler, jointly optimizes two fairness metrics: fairness-to-buyers (FTB) and fairness-to-sellers (FBS). Trading agents are trained independently using the Proximal Policy Optimization (PPO) algorithm. Contribution/Results: Experiments in simulated microgrid environments demonstrate that the framework satisfies over 90% of buyer demand, significantly reduces profit disparity among sellers, and achieves stable fairness scores above 0.80. It further exhibits strong convergence, market equilibrium properties, and scalability across heterogeneous agents.
📝 Abstract
Peer-to-peer (P2P) trading is increasingly recognized as a key mechanism for decentralized market regulation, yet existing approaches often lack robust frameworks to ensure fairness. This paper presents FairMarket-RL, a novel hybrid framework that combines Large Language Models (LLMs) with Reinforcement Learning (RL) to enable fairness-aware trading agents. In a simulated P2P microgrid with multiple sellers and buyers, the LLM acts as a real-time fairness critic, evaluating each trading episode using two metrics: Fairness-To-Buyer (FTB) and Fairness-Between-Sellers (FBS). These fairness scores are integrated into agent rewards through scheduled λ-coefficients, forming an adaptive LLM-guided reward shaping loop that replaces brittle, rule-based fairness constraints. Agents are trained using Independent Proximal Policy Optimization (IPPO) and achieve equitable outcomes, fulfilling over 90% of buyer demand, maintaining fair seller margins, and consistently reaching FTB and FBS scores above 0.80. The training process demonstrates that fairness feedback improves convergence, reduces buyer shortfalls, and narrows profit disparities between sellers. With its language-based critic, the framework scales naturally, and its extension to a large power distribution system with household prosumers illustrates its practical applicability. FairMarket-RL thus offers a scalable, equity-driven solution for autonomous trading in decentralized energy systems.