🤖 AI Summary
This study investigates the feasibility and limitations of leveraging large language models (LLMs) for automated Android root privilege escalation—a critical task in penetration testing—addressing gaps in efficacy, reliability, and scalability. Method: We design an integrated automation framework combining PentestGPT, OpenAI’s API, and a custom web platform, deployed within Genymotion emulators to enable end-to-end vulnerability identification, AI-generated exploit script generation, and response analysis. Contribution/Results: (1) Empirical evaluation confirms LLMs significantly accelerate privilege escalation but require human validation for accuracy and compliance; (2) We identify key capability boundaries and high false-positive rates in mobile security contexts; (3) We propose a novel security and ethics framework for AI-augmented penetration testing. Overall, LLMs demonstrate practical utility in automating routine tasks but remain insufficient to replace expert human analysis in complex, high-stakes security assessments.
📝 Abstract
The rapid evolution of Artificial Intelligence (AI) and Large Language Models (LLMs) has opened up new opportunities in the area of cybersecurity, especially in the exploitation automation landscape and penetration testing. This study explores Android penetration testing automation using LLM-based tools, especially PentestGPT, to identify and execute rooting techniques. Through a comparison of the traditional manual rooting process and exploitation methods produced using AI, this study evaluates the efficacy, reliability, and scalability of automated penetration testing in achieving high-level privilege access on Android devices. With the use of an Android emulator (Genymotion) as the testbed, we fully execute both traditional and exploit-based rooting methods, automating the process using AI-generated scripts. Secondly, we create a web application by integrating OpenAI's API to facilitate automated script generation from LLM-processed responses. The research focuses on the effectiveness of AI-enabled exploitation by comparing automated and manual penetration testing protocols, by determining LLM weaknesses and strengths along the way. We also provide security suggestions of AI-enabled exploitation, including ethical factors and potential misuse. The findings exhibit that while LLMs can significantly streamline the workflow of exploitation, they need to be controlled by humans to ensure accuracy and ethical application. This study adds to the increasing body of literature on AI-powered cybersecurity and its effect on ethical hacking, security research, and mobile device security.