🤖 AI Summary
This work addresses the fairness challenge for AI agents in non-embedded real-time strategy (RTS) card games—exemplified by *Clash Royale*—by proposing the first vision-only, real-time autonomous decision-making framework that operates solely on raw screen pixels without accessing internal game states. Methodologically, we construct a generative object detection dataset tailored to *Clash Royale*; design an end-to-end perception–decision–control pipeline integrating YOLOv8-based visual detection, PaddleOCR-based text recognition, and offline reinforcement learning (Conservative Q-Learning, CQL); and employ a feature-fusion network with lightweight deployment optimizations. Contributions include: (1) the first fully vision-driven closed-loop control system for non-embedded RTS games; (2) real-time inference at 30 FPS on iPhone hardware, consistently outperforming the built-in AI; and (3) full open-sourcing of code, establishing a new benchmark for non-embedded game AI research.
📝 Abstract
Significant progress has been made in AI for games, including board games, MOBA, and RTS games. However, complex agents are typically developed in an embedded manner, directly accessing game state information, unlike human players who rely on noisy visual data, leading to unfair competition. Developing complex non-embedded agents remains challenging, especially in card-based RTS games with complex features and large state spaces. We propose a non-embedded offline reinforcement learning training strategy using visual inputs to achieve real-time autonomous gameplay in the RTS game Clash Royale. Due to the lack of a object detection dataset for this game, we designed an efficient generative object detection dataset for training. We extract features using state-of-the-art object detection and optical character recognition models. Our method enables real-time image acquisition, perception feature fusion, decision-making, and control on mobile devices, successfully defeating built-in AI opponents. All code is open-sourced at https://github.com/wty-yy/katacr.