MAYA: Addressing Inconsistencies in Generative Password Guessing through a Unified Benchmark

📅 2025-04-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing generative models for password guessing lack standardized evaluation, leading to inconsistent performance comparisons and unreliable conclusions. To address this, we propose MAYA—the first customizable, plug-and-play benchmarking framework for generative password guessing. MAYA integrates eight real-world password datasets and rigorously defined, multi-dimensional evaluation scenarios to establish a standardized assessment paradigm. We reimplement six state-of-the-art (SOTA) generative models—including RNNs, Transformers, and VAEs—in PyTorch, and introduce novel components: password syntax parsing, entropy-weighted sampling, distribution alignment metrics, and a parallelized large-scale generation–verification pipeline. Extensive experiments (>15,000 GPU hours) show that sequence-based models achieve an average 23.6% improvement in Top-1000 hit rate; multi-model ensemble attacks yield up to 41.2% gain; models excel on short passwords but exhibit significant generalization degradation on long, complex ones. MAYA enables the first cross-model, cross-dataset, and fully reproducible quantitative analysis of generative password modeling capability.

Technology Category

Application Category

📝 Abstract
The rapid evolution of generative models has led to their integration across various fields, including password guessing, aiming to generate passwords that resemble human-created ones in complexity, structure, and patterns. Despite generative model's promise, inconsistencies in prior research and a lack of rigorous evaluation have hindered a comprehensive understanding of their true potential. In this paper, we introduce MAYA, a unified, customizable, plug-and-play password benchmarking framework. MAYA provides a standardized approach for evaluating generative password-guessing models through a rigorous set of advanced testing scenarios and a collection of eight real-life password datasets. Using MAYA, we comprehensively evaluate six state-of-the-art approaches, which have been re-implemented and adapted to ensure standardization, for a total of over 15,000 hours of computation. Our findings indicate that these models effectively capture different aspects of human password distribution and exhibit strong generalization capabilities. However, their effectiveness varies significantly with long and complex passwords. Through our evaluation, sequential models consistently outperform other generative architectures and traditional password-guessing tools, demonstrating unique capabilities in generating accurate and complex guesses. Moreover, models learn and generate different password distributions, enabling a multi-model attack that outperforms the best individual model. By releasing MAYA, we aim to foster further research, providing the community with a new tool to consistently and reliably benchmark password-generation techniques. Our framework is publicly available at https://github.com/williamcorrias/MAYA-Password-Benchmarking
Problem

Research questions and friction points this paper is trying to address.

Standardizing evaluation of generative password-guessing models
Addressing inconsistencies in prior password generation research
Comparing effectiveness of models on complex passwords
Innovation

Methods, ideas, or system contributions that make the work stand out.

MAYA: unified customizable password benchmarking framework
Standardized evaluation with advanced testing scenarios
Multi-model attack outperforms individual models
🔎 Similar Papers
No similar papers found.