🤖 AI Summary
Existing RAG evaluation benchmarks lack adaptability to dynamic domains—such as online games—where rapid content updates and evolving player community interests jointly introduce dual dynamics; moreover, synthetically generated questions often diverge from authentic player needs. Method: We propose ChronoPlay, the first framework modeling both dual dynamics (game version evolution and community topic drift) and player-centered authenticity. It introduces a dual-dynamic update mechanism—synchronously tracking official patch releases and community discourse shifts—and a dual-source synthesis engine integrating official documentation and community-generated corpora to enable automated, continuous benchmark generation. Contribution/Results: Instantiated across three major online games, ChronoPlay establishes the first dynamic RAG benchmark for gaming. Experiments expose critical bottlenecks in current RAG systems regarding temporal freshness, factual consistency, and player intent comprehension, thereby establishing a new paradigm for evaluating RAG in dynamic domains.
📝 Abstract
Retrieval Augmented Generation (RAG) systems are increasingly vital in dynamic domains like online gaming, yet the lack of a dedicated benchmark has impeded standardized evaluation in this area. The core difficulty lies in Dual Dynamics: the constant interplay between game content updates and the shifting focus of the player community. Furthermore, the necessity of automating such a benchmark introduces a critical requirement for player-centric authenticity to ensure generated questions are realistic. To address this integrated challenge, we introduce ChronoPlay, a novel framework for the automated and continuous generation of game RAG benchmarks. ChronoPlay utilizes a dual-dynamic update mechanism to track both forms of change, and a dual-source synthesis engine that draws from official sources and player community to ensure both factual correctness and authentic query patterns. We instantiate our framework on three distinct games to create the first dynamic RAG benchmark for the gaming domain, offering new insights into model performance under these complex and realistic conditions. Code is avaliable at: https://github.com/hly1998/ChronoPlay.