Shy Guys: A Light-Weight Approach to Detecting Robots on Websites

📅 2026-03-30
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the limitations of existing bot detection methods, which often incur high resource overhead, elevated costs, or degraded user experience, rendering them ineffective against sophisticated伪装 crawlers. To overcome these challenges, this work proposes a lightweight, fully passive detection mechanism that operates solely on standard web server logs. By uniquely integrating user-agent string parsing with heuristic analysis of favicon request behavior, the approach identifies malicious traffic without any client-side interaction. Evaluated on a real-world dataset comprising 4.6 million HTTP requests, the method achieves a detection rate of 67.7% with a false positive rate of only 3%, substantially outperforming current techniques—whose detection rates typically fall below 20%. The proposed solution thus serves as an efficient first line of defense, significantly reducing reliance on active challenge-based mechanisms.
📝 Abstract
Automated bots now account for roughly half of all web requests, and an increasing number deliberately spoof their identity to either evade detection or to not respect robots.txt. Existing countermeasures are either resource-intensive (JavaScript challenges, CAPTCHAs), cost-prohibitive (commercial solutions), or degrade the user experience. This paper proposes a lightweight, passive approach to bot detection that combines user-agent string analysis with favicon-based heuristics, operating entirely on standard web server logs with no client-side interaction. We evaluate the method on over 4.6 million requests containing 54,945 unique user-agent strings collected from website hosted all around the earth. Our approach detects 67.7% of bot traffic while maintaining a false-positive rate of 3%, outperforming state of the art (less than 20%). This method can serve as a first line of defence, routing only genuinely ambiguous requests to active challenges and preserving the experience of legitimate users.
Problem

Research questions and friction points this paper is trying to address.

bot detection
web traffic
user-agent spoofing
robots.txt evasion
lightweight detection
Innovation

Methods, ideas, or system contributions that make the work stand out.

lightweight bot detection
user-agent analysis
favicon heuristics
passive detection
web server logs
🔎 Similar Papers
No similar papers found.
R
Rémi Van Boxem
UCLouvain, Institute of Information and Communication Technologies, Electronics and Applied Mathematics, ICTEAM, Pôle en ingénierie informatique, INGI, Place Sainte Barbe 2, Louvain-la-Neuve, Belgium
Tom Barbette
Tom Barbette
UCLouvain
Computer ScienceHigh-Speed NetworkingNFVSmartNICLoad-Balancing
Cristel Pelsser
Cristel Pelsser
UCLouvain
Internet routingmeasurementsembedded systems
Ramin Sadre
Ramin Sadre
Université catholique de Louvain
Network securityTraffic modelingSCADA networksNetwork measurementFlow monitoring