LLM-Assisted Web Measurements

📅 2025-10-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing Internet security and privacy measurement studies are hindered by the lack of semantically annotated public website lists, impeding targeted, category-specific measurements. Method: This paper presents the first systematic evaluation of large language models (LLMs) for website semantic classification. We construct a domain-specific annotated dataset and leverage prompt engineering with few-shot learning to achieve high-accuracy, automated identification of website functionality and categories. Contribution/Results: Experiments demonstrate that LLMs significantly outperform traditional classification methods across multi-class website categorization. The generated semantic labels enable high-quality, reproducible measurement studies—yielding conclusions consistent with prior work—while substantially improving target website discovery efficiency and research reproducibility. Our work establishes LLMs as a feasible and effective new paradigm for Internet measurement.

Technology Category

Application Category

📝 Abstract
Web measurements are a well-established methodology for assessing the security and privacy landscape of the Internet. However, existing top lists of popular websites commonly used as measurement targets are unlabeled and lack semantic information about the nature of the sites they include. This limitation makes targeted measurements challenging, as researchers often need to rely on ad-hoc techniques to bias their datasets toward specific categories of interest. In this paper, we investigate the use of Large Language Models (LLMs) as a means to enable targeted web measurement studies through their semantic understanding capabilities. Building on prior literature, we identify key website classification tasks relevant to web measurements and construct datasets to systematically evaluate the performance of different LLMs on these tasks. Our results demonstrate that LLMs may achieve strong performance across multiple classification scenarios. We then conduct LLM-assisted web measurement studies inspired by prior work and rigorously assess the validity of the resulting research inferences. Our results demonstrate that LLMs can serve as a practical tool for analyzing security and privacy trends on the Web.
Problem

Research questions and friction points this paper is trying to address.

Classifying websites semantically for targeted security measurements
Overcoming unlabeled website lists lacking categorical information
Evaluating LLMs' capability to enable precise web measurement studies
Innovation

Methods, ideas, or system contributions that make the work stand out.

LLMs classify websites for targeted web measurements
Systematic evaluation of LLMs on classification tasks
LLMs enable security and privacy trend analysis
🔎 Similar Papers
No similar papers found.