ChronoPlay: A Framework for Modeling Dual Dynamics and Authenticity in Game RAG Benchmarks

📅 2025-10-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing RAG evaluation benchmarks lack adaptability to dynamic domains—such as online games—where rapid content updates and evolving player community interests jointly introduce dual dynamics; moreover, synthetically generated questions often diverge from authentic player needs. Method: We propose ChronoPlay, the first framework modeling both dual dynamics (game version evolution and community topic drift) and player-centered authenticity. It introduces a dual-dynamic update mechanism—synchronously tracking official patch releases and community discourse shifts—and a dual-source synthesis engine integrating official documentation and community-generated corpora to enable automated, continuous benchmark generation. Contribution/Results: Instantiated across three major online games, ChronoPlay establishes the first dynamic RAG benchmark for gaming. Experiments expose critical bottlenecks in current RAG systems regarding temporal freshness, factual consistency, and player intent comprehension, thereby establishing a new paradigm for evaluating RAG in dynamic domains.

Technology Category

Application Category

📝 Abstract
Retrieval Augmented Generation (RAG) systems are increasingly vital in dynamic domains like online gaming, yet the lack of a dedicated benchmark has impeded standardized evaluation in this area. The core difficulty lies in Dual Dynamics: the constant interplay between game content updates and the shifting focus of the player community. Furthermore, the necessity of automating such a benchmark introduces a critical requirement for player-centric authenticity to ensure generated questions are realistic. To address this integrated challenge, we introduce ChronoPlay, a novel framework for the automated and continuous generation of game RAG benchmarks. ChronoPlay utilizes a dual-dynamic update mechanism to track both forms of change, and a dual-source synthesis engine that draws from official sources and player community to ensure both factual correctness and authentic query patterns. We instantiate our framework on three distinct games to create the first dynamic RAG benchmark for the gaming domain, offering new insights into model performance under these complex and realistic conditions. Code is avaliable at: https://github.com/hly1998/ChronoPlay.
Problem

Research questions and friction points this paper is trying to address.

Lack of dedicated benchmark for game RAG systems
Addresses dual dynamics of game updates and player focus
Ensures player-centric authenticity in automated evaluation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dual-dynamic update mechanism tracks game and player changes
Dual-source synthesis engine combines official and community data
Automated continuous generation of dynamic gaming RAG benchmarks
🔎 Similar Papers
No similar papers found.
L
Liyang He
Tencent
Y
Yuren Zhang
Tencent
Z
Ziwei Zhu
The Chinese University of Hong Kong
Z
Zhenghui Li
Independent Researcher
Shiwei Tong
Shiwei Tong
腾讯游戏数据科学家,中国科学技术大学博士
GDMEDMNLP