Playing Non-embedded Card-Based Games with Reinforcement Learning

📅 2025-04-07
🏛️ International Conference on Intelligent Robotics and Applications
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the fairness challenge for AI agents in non-embedded real-time strategy (RTS) card games—exemplified by *Clash Royale*—by proposing the first vision-only, real-time autonomous decision-making framework that operates solely on raw screen pixels without accessing internal game states. Methodologically, we construct a generative object detection dataset tailored to *Clash Royale*; design an end-to-end perception–decision–control pipeline integrating YOLOv8-based visual detection, PaddleOCR-based text recognition, and offline reinforcement learning (Conservative Q-Learning, CQL); and employ a feature-fusion network with lightweight deployment optimizations. Contributions include: (1) the first fully vision-driven closed-loop control system for non-embedded RTS games; (2) real-time inference at 30 FPS on iPhone hardware, consistently outperforming the built-in AI; and (3) full open-sourcing of code, establishing a new benchmark for non-embedded game AI research.

Technology Category

Application Category

📝 Abstract
Significant progress has been made in AI for games, including board games, MOBA, and RTS games. However, complex agents are typically developed in an embedded manner, directly accessing game state information, unlike human players who rely on noisy visual data, leading to unfair competition. Developing complex non-embedded agents remains challenging, especially in card-based RTS games with complex features and large state spaces. We propose a non-embedded offline reinforcement learning training strategy using visual inputs to achieve real-time autonomous gameplay in the RTS game Clash Royale. Due to the lack of a object detection dataset for this game, we designed an efficient generative object detection dataset for training. We extract features using state-of-the-art object detection and optical character recognition models. Our method enables real-time image acquisition, perception feature fusion, decision-making, and control on mobile devices, successfully defeating built-in AI opponents. All code is open-sourced at https://github.com/wty-yy/katacr.
Problem

Research questions and friction points this paper is trying to address.

Develop non-embedded AI agents using visual inputs for fair gameplay
Address lack of object detection dataset for card-based RTS games
Enable real-time autonomous gameplay on mobile devices
Innovation

Methods, ideas, or system contributions that make the work stand out.

Non-embedded RL training with visual inputs
Generative object detection dataset creation
Real-time mobile feature fusion and control
🔎 Similar Papers
No similar papers found.
T
Tianyang Wu
National Key Laboratory of Human-Machine Hybrid Augmented Intelligence, Institute of Artificial Intelligence and Robotics, Xi’an Jiaotong University, No. 28 West Xianning Road, Xi’an, PR China.
Lipeng Wan
Lipeng Wan
Georgia State University
Scientific Data ManagementHPCData-Intensive ComputingStorage and I/OSystem Resilience
Y
Yuhang Wang
National Key Laboratory of Human-Machine Hybrid Augmented Intelligence, Institute of Artificial Intelligence and Robotics, Xi’an Jiaotong University, No. 28 West Xianning Road, Xi’an, PR China.
Q
Qiang Wan
National Key Laboratory of Human-Machine Hybrid Augmented Intelligence, Institute of Artificial Intelligence and Robotics, Xi’an Jiaotong University, No. 28 West Xianning Road, Xi’an, PR China.
X
Xuguang Lan
National Key Laboratory of Human-Machine Hybrid Augmented Intelligence, Institute of Artificial Intelligence and Robotics, Xi’an Jiaotong University, No. 28 West Xianning Road, Xi’an, PR China.