Unlocking User-oriented Pages: Intention-driven Black-box Scanner for Real-world Web Applications

📅 2025-04-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing black-box scanners suffer from low test coverage and limited vulnerability detection due to their inability to understand user interaction semantics, particularly for deep-functionality pages requiring multi-step navigation. This work proposes the first large language model (LLM)-driven, semantic-aware black-box scanning framework. It models user intent via LLM-based intent inference and integrates dynamic page navigation with semantics-guided crawling to enable intent-oriented, intelligent path exploration—breaking away from conventional structure- or randomness-based crawling paradigms. Experimental evaluation across 12 open-source web applications demonstrates that our approach achieves, on average, twice the page coverage of state-of-the-art tools; over 90% of generated requests target core functional modules; and it successfully uncovers multiple previously undetected high-severity vulnerabilities.

Technology Category

Application Category

📝 Abstract
Black-box scanners have played a significant role in detecting vulnerabilities for web applications. A key focus in current black-box scanning is increasing test coverage (i.e., accessing more web pages). However, since many web applications are user-oriented, some deep pages can only be accessed through complex user interactions, which are difficult to reach by existing black-box scanners. To fill this gap, a key insight is that web pages contain a wealth of semantic information that can aid in understanding potential user intention. Based on this insight, we propose Hoyen, a black-box scanner that uses the Large Language Model to predict user intention and provide guidance for expanding the scanning scope. Hoyen has been rigorously evaluated on 12 popular open-source web applications and compared with 6 representative tools. The results demonstrate that Hoyen performs a comprehensive exploration of web applications, expanding the attack surface while achieving about 2x than the coverage of other scanners on average, with high request accuracy. Furthermore, Hoyen detected over 90% of its requests towards the core functionality of the application, detecting more vulnerabilities than other scanners, including unique vulnerabilities in well-known web applications. Our data/code is available at https://hoyen.tjunsl.com/
Problem

Research questions and friction points this paper is trying to address.

Detect vulnerabilities in user-oriented web pages
Access deep pages via complex user interactions
Improve scanning coverage using intention prediction
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses LLM to predict user intentions
Guides scanner to explore deep pages
Achieves higher coverage and accuracy
🔎 Similar Papers
No similar papers found.
W
Weizhe Wang
Tianjin University
Y
Yao Zhang
Tianjin University
K
Kaitai Liang
Delft University of Technology
Guangquan Xu
Guangquan Xu
Tianjin University
Cyber Security,IoT Security,Trust Management,Trusted Computing
H
Hongpeng Bai
Tianjin University
Q
Qingyang Yan
Tianjin University
X
Xi Zheng
B
Bin Wu
Tianjin University