Causal Reinforcement Learning for Complex Card Games: A Magic The Gathering Benchmark

📅 2026-05-07

📈 Citations: 0

✨ Influential: 0

career value

223K/year

🤖 AI Summary

This work addresses the lack of benchmarks for causal reinforcement learning in complex systems characterized by partial observability, large masked action spaces, and explicit causal structures. We propose MTG-Causal-RL, the first causal RL benchmark based on Magic: The Gathering, integrating multiple deck strategies, reward mechanisms, and a handcrafted structural causal model (SCM) over strategic variables. To leverage this benchmark, we introduce Causal Graph Factorized Advantage PPO (CGFA-PPO), which incorporates an intervention calibration loss and a factor-aligned critic objective within the Gymnasium framework, enabling causal credit assignment, cross-deck transfer, and auditability of policies. Experiments demonstrate that CGFA-PPO achieves higher in-distribution win rates than baselines, while factor-calibrated trajectories and leave-one-out transfer gaps reveal structural diagnostic insights beyond scalar performance metrics.

📝 Abstract

Causal reinforcement learning (RL) lacks benchmarks for complex systems that combine sequential decision making, hidden information, large masked action spaces, and explicit causal structure. We introduce MTG-Causal-RL, a Gymnasium benchmark built on Magic: The Gathering with a 3,077-dimensional partial observation, a 478-action masked discrete action space, five competitive Standard archetypes, three reward schemes, and a hand-specified Structural Causal Model (SCM) over strategic variables. Every episode exposes causal variables, SCM-predicted intervention effects, and per-factor credit traces, making causal credit assignment, leave-one-out cross-archetype transfer, and policy auditability first-class metrics. We adapt a panel of reference baselines: random, heuristic, masked PPO, a causal-world-model PPO variant, and an architecture-matched scalar control. We propose Causal Graph-Factored Advantage PPO (CGFA-PPO) as a reference causal agent that uses SCM parents of win probability as factor-aligned critic targets with an intervention-calibration loss. All comparisons use paired seeds, paired-bootstrap confidence intervals, and Holm-Bonferroni correction within pre-registered families. Masked PPO and CGFA-PPO reach competitive in-distribution win rates and exceed the random baseline; per-factor calibration trajectories and leave-one-out transfer gaps expose diagnostic structure that scalar win rate alone cannot. We release the benchmark, reference-baseline results, and full evaluation protocol openly. By coupling a strategically rich, partially observed domain with an explicit causal interface and statistical protocol, MTG-Causal-RL gives causal-RL, world-model, and LLM-agent research a shared testbed for questions current benchmarks cannot pose together: causal credit assignment under masked action spaces, structural transfer across archetypes, and SCM-grounded policy auditability.

Problem

Research questions and friction points this paper is trying to address.

causal reinforcement learning

benchmark

complex card games

masked action spaces

causal credit assignment

Innovation

Methods, ideas, or system contributions that make the work stand out.

Causal Reinforcement Learning

Structural Causal Model

Masked Action Space

Credit Assignment

Policy Auditability

🔎 Similar Papers

No similar papers found.