Playing Non-embedded Card-Based Games with Reinforcement Learning

📅 2025-04-07

🏛️ International Conference on Intelligent Robotics and Applications

📈 Citations: 0

✨ Influential: 0

career value

188K/year

🤖 AI Summary

This work addresses the fairness challenge for AI agents in non-embedded real-time strategy (RTS) card games—exemplified by *Clash Royale*—by proposing the first vision-only, real-time autonomous decision-making framework that operates solely on raw screen pixels without accessing internal game states. Methodologically, we construct a generative object detection dataset tailored to *Clash Royale*; design an end-to-end perception–decision–control pipeline integrating YOLOv8-based visual detection, PaddleOCR-based text recognition, and offline reinforcement learning (Conservative Q-Learning, CQL); and employ a feature-fusion network with lightweight deployment optimizations. Contributions include: (1) the first fully vision-driven closed-loop control system for non-embedded RTS games; (2) real-time inference at 30 FPS on iPhone hardware, consistently outperforming the built-in AI; and (3) full open-sourcing of code, establishing a new benchmark for non-embedded game AI research.

Technology Category

Application Category

📝 Abstract

Significant progress has been made in AI for games, including board games, MOBA, and RTS games. However, complex agents are typically developed in an embedded manner, directly accessing game state information, unlike human players who rely on noisy visual data, leading to unfair competition. Developing complex non-embedded agents remains challenging, especially in card-based RTS games with complex features and large state spaces. We propose a non-embedded offline reinforcement learning training strategy using visual inputs to achieve real-time autonomous gameplay in the RTS game Clash Royale. Due to the lack of a object detection dataset for this game, we designed an efficient generative object detection dataset for training. We extract features using state-of-the-art object detection and optical character recognition models. Our method enables real-time image acquisition, perception feature fusion, decision-making, and control on mobile devices, successfully defeating built-in AI opponents. All code is open-sourced at https://github.com/wty-yy/katacr.

Problem

Research questions and friction points this paper is trying to address.

Develop non-embedded AI agents using visual inputs for fair gameplay

Address lack of object detection dataset for card-based RTS games

Enable real-time autonomous gameplay on mobile devices

Innovation

Methods, ideas, or system contributions that make the work stand out.

Non-embedded RL training with visual inputs

Generative object detection dataset creation

Real-time mobile feature fusion and control

🔎 Similar Papers

No similar papers found.