🤖 AI Summary
Current AI systems exhibit significantly inferior real-time card selection performance compared to human experts during the Draft phase of *Magic: The Gathering* (MTG).
Method: We propose the first LoRA-based fine-tuning approach for large language models (LLMs) to address this challenge—adapting purely textual LLMs to partially observable, long-horizon card drafting decisions. Leveraging open-source LLMs, we perform low-rank adaptation via instruction tuning and domain-specific prompting on annotated Draft log data.
Contribution/Results: With only 10,000 training steps, our lightweight model achieves 66.2% card selection accuracy—substantially outperforming zero-shot GPT-4o (43%) and approaching the performance of dedicated models. Crucially, it demonstrates rapid generalization to unseen MTG expansions. This work constitutes the first empirical validation of the feasibility and effectiveness of lightweight LLM fine-tuning for real-time strategic decision-making in collectible card games (CCGs).
📝 Abstract
Collectible card games (CCGs) are a difficult genre for AI due to their partial observability, long-term decision-making, and evolving card sets. Due to this, current AI models perform vastly worse than human players at CCG tasks such as deckbuilding and gameplay. In this work, we introduce $ extit{UrzaGPT}$, a domain-adapted large language model that recommends real-time drafting decisions in $ extit{Magic: The Gathering}$. Starting from an open-weight LLM, we use Low-Rank Adaptation fine-tuning on a dataset of annotated draft logs. With this, we leverage the language modeling capabilities of LLM, and can quickly adapt to different expansions of the game. We benchmark $ extit{UrzaGPT}$ in comparison to zero-shot LLMs and the state-of-the-art domain-specific model. Untuned, small LLMs like Llama-3-8B are completely unable to draft, but the larger GPT-4o achieves a zero-shot performance of $43%$. Using UrzaGPT to fine-tune smaller models, we achieve an accuracy of $66.2%$ using only 10,000 steps. Despite this not reaching the capability of domain-specific models, we show that solely using LLMs to draft is possible and conclude that using LLMs can enable performant, general, and update-friendly drafting AIs in the future.