UrzaGPT: LoRA-Tuned Large Language Models for Card Selection in Collectible Card Games

📅 2025-08-11

📈 Citations: 0

✨ Influential: 0

career value

175K/year

🤖 AI Summary

Current AI systems exhibit significantly inferior real-time card selection performance compared to human experts during the Draft phase of *Magic: The Gathering* (MTG). Method: We propose the first LoRA-based fine-tuning approach for large language models (LLMs) to address this challenge—adapting purely textual LLMs to partially observable, long-horizon card drafting decisions. Leveraging open-source LLMs, we perform low-rank adaptation via instruction tuning and domain-specific prompting on annotated Draft log data. Contribution/Results: With only 10,000 training steps, our lightweight model achieves 66.2% card selection accuracy—substantially outperforming zero-shot GPT-4o (43%) and approaching the performance of dedicated models. Crucially, it demonstrates rapid generalization to unseen MTG expansions. This work constitutes the first empirical validation of the feasibility and effectiveness of lightweight LLM fine-tuning for real-time strategic decision-making in collectible card games (CCGs).

Technology Category

Application Category

📝 Abstract

Collectible card games (CCGs) are a difficult genre for AI due to their partial observability, long-term decision-making, and evolving card sets. Due to this, current AI models perform vastly worse than human players at CCG tasks such as deckbuilding and gameplay. In this work, we introduce $ extit{UrzaGPT}$, a domain-adapted large language model that recommends real-time drafting decisions in $ extit{Magic: The Gathering}$. Starting from an open-weight LLM, we use Low-Rank Adaptation fine-tuning on a dataset of annotated draft logs. With this, we leverage the language modeling capabilities of LLM, and can quickly adapt to different expansions of the game. We benchmark $ extit{UrzaGPT}$ in comparison to zero-shot LLMs and the state-of-the-art domain-specific model. Untuned, small LLMs like Llama-3-8B are completely unable to draft, but the larger GPT-4o achieves a zero-shot performance of $43%$. Using UrzaGPT to fine-tune smaller models, we achieve an accuracy of $66.2%$ using only 10,000 steps. Despite this not reaching the capability of domain-specific models, we show that solely using LLMs to draft is possible and conclude that using LLMs can enable performant, general, and update-friendly drafting AIs in the future.

Problem

Research questions and friction points this paper is trying to address.

AI struggles with deckbuilding in collectible card games

Current models underperform humans in CCG decision-making

Adapting LLMs for real-time drafting in Magic: The Gathering

Innovation

Methods, ideas, or system contributions that make the work stand out.

LoRA fine-tuning for card selection

Adapting LLMs to game expansions

Benchmarking against zero-shot models

🔎 Similar Papers

Enhancing Commentary Strategies for Imperfect Information Card Games: A Study of Large Language Models in Guandan Commentary