Reinforcement Learning for Game-Theoretic Resource Allocation on Graphs

📅 2025-05-08

📈 Citations: 0

✨ Influential: 0

career value

211K/year

🤖 AI Summary

This work studies the multi-step resource competition problem between two players under graph-structured constraints, formulated as a Multi-step Colonel Blotto Game on Graphs (MCBG). To address the challenges of dynamic action spaces and graph-topological constraints in strategy computation, we propose an adjacency-matrix-driven action displacement mechanism that dynamically generates feasible action sets respecting graph structure. We further introduce, for the first time, a hybrid deep reinforcement learning framework integrating Deep Q-Networks (DQN) and Proximal Policy Optimization (PPO) to enable adaptive policy optimization under asymmetric graph topologies and initial resource disadvantages. Experimental results demonstrate that our method significantly outperforms baseline strategies—including random and greedy policies—across diverse graph structures; achieves stable convergence to a 50% win rate against adversarial learning-based opponents; and actively exploits structural asymmetries to enhance resource allocation efficiency.

Technology Category

Application Category

📝 Abstract

Game-theoretic resource allocation on graphs (GRAG) involves two players competing over multiple steps to control nodes of interest on a graph, a problem modeled as a multi-step Colonel Blotto Game (MCBG). Finding optimal strategies is challenging due to the dynamic action space and structural constraints imposed by the graph. To address this, we formulate the MCBG as a Markov Decision Process (MDP) and apply Reinforcement Learning (RL) methods, specifically Deep Q-Network (DQN) and Proximal Policy Optimization (PPO). To enforce graph constraints, we introduce an action-displacement adjacency matrix that dynamically generates valid action sets at each step. We evaluate RL performance across a variety of graph structures and initial resource distributions, comparing against random, greedy, and learned RL policies. Experimental results show that both DQN and PPO consistently outperform baseline strategies and converge to a balanced $50%$ win rate when competing against the learned RL policy. Particularly, on asymmetric graphs, RL agents successfully exploit structural advantages and adapt their allocation strategies, even under disadvantageous initial resource distributions.

Problem

Research questions and friction points this paper is trying to address.

Optimizing game-theoretic resource allocation on graphs

Addressing dynamic action space and graph constraints

Evaluating RL performance against baseline strategies

Innovation

Methods, ideas, or system contributions that make the work stand out.

Formulate MCBG as MDP using RL methods

Introduce action-displacement adjacency matrix

Evaluate DQN and PPO against baselines

🔎 Similar Papers

No similar papers found.