🤖 AI Summary
This work addresses the limitations of traditional password guessing approaches, which poorly emulate real-world attacker behavior, and existing large language model (LLM)-based methods that rely heavily on handcrafted prompts. The authors propose OpenEvolve, a novel system that introduces LLM-guided prompt evolution into password guessing for the first time, integrating MAP-Elites quality-diversity search with island-based population evolution to enable fully automated, human-intervention-free prompt optimization. Evaluated on RockYou-derived test sets, OpenEvolve significantly improves cracking rates from 2.02% to 8.48% and generates passwords whose character distributions more closely mirror those of real users. Experiments employing Qwen3-8B (local), Gemini-2.5 Flash (cloud), and an ensemble configuration demonstrate consistent attack performance gains across models, substantially enhancing the effectiveness of LLM-driven password auditing.
📝 Abstract
Passwords still remain a dominant authentication method, yet their security is routinely subverted by predictable user choices and large-scale credential leaks. Automated password guessing is a key tool for stress-testing password policies and modeling attacker behavior. This paper applies LLM-driven evolutionary computation to automatically optimize prompts for the LLM password guessing framework. Using OpenEvolve, an open-source system combining MAP-Elites quality-diversity search with an island population model we evolve prompts that maximize cracking rate on a RockYou-derived test set. We evaluate three configurations: a local setup with Qwen3 8B, a single compact cloud model Gemini-2.5 Flash, and a two-model ensemble of frontier LLMs. The approach raises the cracking rates from 2.02\% to 8.48\%. Character distribution analysis further confirms how evolved prompts produce statistically more realistic passwords. Automated prompt evolution is a low-barrier yet effective way to strengthen LLM-based password auditing and underlining how attack pipelines show tendency via automated improvements.