PhishingHook: Catching Phishing Ethereum Smart Contracts leveraging EVM Opcodes

📅 2025-06-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address privacy leakage caused by transaction replay in Ethereum phishing smart contract detection, this paper proposes PhishingHook—a novel static analysis framework that leverages EVM bytecode and opcode sequences for pre-deployment malicious contract identification. Methodologically, it integrates 16 machine learning techniques, including histogram-based similarity classification, vision models (CNN/ViT), code-language models (CodeBERT), and vulnerability detection models. Evaluated on 7,000 real-world malicious contracts, PhishingHook achieves an average accuracy of 90%. Its key contributions are: (1) eliminating reliance on transaction replay, thereby preserving user privacy; (2) establishing the first EVM-opcode–centric detection paradigm specifically for phishing smart contracts; and (3) open-sourcing both implementation code and benchmark dataset to foster reproducible research.

Technology Category

Application Category

📝 Abstract
The Ethereum Virtual Machine (EVM) is a decentralized computing engine. It enables the Ethereum blockchain to execute smart contracts and decentralized applications (dApps). The increasing adoption of Ethereum sparked the rise of phishing activities. Phishing attacks often target users through deceptive means, e.g., fake websites, wallet scams, or malicious smart contracts, aiming to steal sensitive information or funds. A timely detection of phishing activities in the EVM is therefore crucial to preserve the user trust and network integrity. Some state-of-the art approaches to phishing detection in smart contracts rely on the online analysis of transactions and their traces. However, replaying transactions often exposes sensitive user data and interactions, with several security concerns. In this work, we present PhishingHook, a framework that applies machine learning techniques to detect phishing activities in smart contracts by directly analyzing the contract's bytecode and its constituent opcodes. We evaluate the efficacy of such techniques in identifying malicious patterns, suspicious function calls, or anomalous behaviors within the contract's code itself before it is deployed or interacted with. We experimentally compare 16 techniques, belonging to four main categories (Histogram Similarity Classifiers, Vision Models, Language Models and Vulnerability Detection Models), using 7,000 real-world malware smart contracts. Our results demonstrate the efficiency of PhishingHook in performing phishing classification systems, with about 90% average accuracy among all the models. We support experimental reproducibility, and we release our code and datasets to the research community.
Problem

Research questions and friction points this paper is trying to address.

Detects phishing Ethereum smart contracts using EVM opcodes
Identifies malicious patterns in contract bytecode pre-deployment
Evaluates machine learning models for phishing classification accuracy
Innovation

Methods, ideas, or system contributions that make the work stand out.

Machine learning detects phishing via EVM opcodes
Analyzes contract bytecode pre-deployment for anomalies
Compares 16 techniques with 90% accuracy
🔎 Similar Papers
No similar papers found.