🤖 AI Summary
Traditional Elo systems fail to accurately estimate player skill in stochastic, imperfect-information games (e.g., Rummy), as they rely solely on win/loss outcomes and neglect inherent variability in initial game states—particularly hand quality—which confounds skill assessment with luck.
Method: We propose an enhanced Elo framework that explicitly models initial hand quality via a computationally efficient hand-evaluation model and a hand-normalized performance metric, thereby decoupling skill from stochasticity. Parameters are calibrated and validated using a large-scale simulation corpus of 270,000 matches across diverse strategic agents.
Contribution/Results: Experiments across six strategy-pairing scenarios demonstrate significantly improved rating stability, a 19.3% gain in match outcome prediction accuracy, and markedly superior skill discriminability compared to standard Elo—especially under low-sample-size and high-variance conditions.
📝 Abstract
Rating systems play a crucial role in evaluating player skill across competitive environments. The Elo rating system, originally designed for deterministic and information-complete games such as chess, has been widely adopted and modified in various domains. However, the traditional Elo rating system only considers game outcomes for rating calculation and assumes uniform initial states across players. This raises important methodological challenges in skill modelling for popular partially randomized incomplete-information games such as Rummy. In this paper, we examine the limitations of conventional Elo ratings when applied to luck-driven environments and propose a modified Elo framework specifically tailored for Rummy. Our approach incorporates score-based performance metrics and explicitly models the influence of initial hand quality to disentangle skill from luck. Through extensive simulations involving 270,000 games across six strategies of varying sophistication, we demonstrate that our proposed system achieves stable convergence, superior discriminative power, and enhanced predictive accuracy compared to traditional Elo formulations. The framework maintains computational simplicity while effectively capturing the interplay of skill, strategy, and randomness, with broad applicability to other stochastic competitive environments.