FairMarket-RL: LLM-Guided Fairness Shaping for Multi-Agent Reinforcement Learning in Peer-to-Peer Markets

📅 2025-06-27

📈 Citations: 0

✨ Influential: 0

career value

218K/year

🤖 AI Summary

Existing peer-to-peer (P2P) energy trading frameworks lack robust mechanisms to ensure fairness, particularly in multi-agent microgrid markets. Method: This paper proposes a fairness-aware trading agent framework integrating large language models (LLMs) with reinforcement learning (RL). Specifically, an LLM serves as a scalable, interpretable real-time fairness evaluator—replacing inflexible rule-based criteria—and a dynamic reward shaping mechanism, governed by a λ-coefficient scheduler, jointly optimizes two fairness metrics: fairness-to-buyers (FTB) and fairness-to-sellers (FBS). Trading agents are trained independently using the Proximal Policy Optimization (PPO) algorithm. Contribution/Results: Experiments in simulated microgrid environments demonstrate that the framework satisfies over 90% of buyer demand, significantly reduces profit disparity among sellers, and achieves stable fairness scores above 0.80. It further exhibits strong convergence, market equilibrium properties, and scalability across heterogeneous agents.

Technology Category

Application Category

📝 Abstract

Peer-to-peer (P2P) trading is increasingly recognized as a key mechanism for decentralized market regulation, yet existing approaches often lack robust frameworks to ensure fairness. This paper presents FairMarket-RL, a novel hybrid framework that combines Large Language Models (LLMs) with Reinforcement Learning (RL) to enable fairness-aware trading agents. In a simulated P2P microgrid with multiple sellers and buyers, the LLM acts as a real-time fairness critic, evaluating each trading episode using two metrics: Fairness-To-Buyer (FTB) and Fairness-Between-Sellers (FBS). These fairness scores are integrated into agent rewards through scheduled λ-coefficients, forming an adaptive LLM-guided reward shaping loop that replaces brittle, rule-based fairness constraints. Agents are trained using Independent Proximal Policy Optimization (IPPO) and achieve equitable outcomes, fulfilling over 90% of buyer demand, maintaining fair seller margins, and consistently reaching FTB and FBS scores above 0.80. The training process demonstrates that fairness feedback improves convergence, reduces buyer shortfalls, and narrows profit disparities between sellers. With its language-based critic, the framework scales naturally, and its extension to a large power distribution system with household prosumers illustrates its practical applicability. FairMarket-RL thus offers a scalable, equity-driven solution for autonomous trading in decentralized energy systems.

Problem

Research questions and friction points this paper is trying to address.

Ensures fairness in P2P trading with LLM-guided reinforcement learning

Balances buyer demand and seller margins using adaptive fairness metrics

Improves convergence and reduces profit disparities in decentralized markets

Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM-guided fairness shaping for multi-agent RL

Adaptive reward shaping with fairness metrics

Independent Proximal Policy Optimization for equity

🔎 Similar Papers

Assistive Large Language Model Agents for Socially-Aware Negotiation Dialogues