🤖 AI Summary
This work addresses the illicit copying and misuse of game AI strategies—such as cheating in online chess—by introducing, for the first time, the KGW watermarking technique from large language models into the domain of perfect-information extensive-form games. The method embeds detectable, covert signals into agent policies to enable source attribution. By integrating statistical hypothesis testing with game-theoretic strategy modeling, the study establishes a theoretical trade-off between policy performance and watermark detectability. Empirical evaluations across multiple chess engines demonstrate that the watermarking incurs negligible degradation in strategic strength while enabling high-confidence detection with only a small number of observed games.
📝 Abstract
Watermarking techniques for large language models (LLMs), which encode hidden information in the output so its source can be verified, have gained significant attention in recent days, thanks to their potential capability to detect accidental or deliberate misuse. Similar challenges involving model misuse also exist in the context of game-playing, such as when detecting the unauthorized use of AI tools in gaming platforms (e.g., cheating in online chess). In this paper, we initiate the study of how game-playing strategies can be watermarked. We show how the KGW watermark for LLMs can be adapted to watermark game-playing agents in perfect-information extensive-form games. The watermark can then be detected using a statistical test. We show that the degradation in the quality of the watermarked strategy profile, quantified by the expected utility, can be bounded, but there is a tradeoff between detectability and quality. In our experiments, we bootstrap the watermarking framework to various chess engines and demonstrate that a) the impact of the watermark on the quality of the strategy is negligible and b) the watermark can be detected with just a handful of games.