ContractShield: Bridging Semantic-Structural Gaps via Hierarchical Cross-Modal Fusion for Multi-Label Vulnerability Detection in Obfuscated Smart Contracts

📅 2026-04-03

📈 Citations: 0

✨ Influential: 0

career value

184K/year

🤖 AI Summary

This work addresses the vulnerability of existing methods to adversarial obfuscation attacks in smart contract vulnerability detection, which often stems from isolated modeling of semantic, temporal, and structural features coupled with simplistic fusion strategies. To overcome this limitation, the authors propose ContractShield, a novel framework featuring a three-tier hierarchical cross-modal fusion mechanism. Specifically, it leverages CodeBERT with a sliding window for source code semantics, xLSTM for opcode temporal dynamics, and GATv2 for control flow graph structure, followed by dynamic integration of these modalities through self-attention, cross-modal attention, and adaptive weighting. Experimental results demonstrate that ContractShield achieves a Hamming Score of 89% under obfuscation—declining by only 1–3% compared to non-obfuscated settings—and simultaneously detects five major vulnerability types with an F1-score of 91%, significantly outperforming state-of-the-art approaches.

Technology Category

Application Category

📝 Abstract

Smart contracts are increasingly targeted by adversaries employing obfuscation techniques such as bogus code injection and control flow manipulation to evade vulnerability detection. Existing multimodal methods often process semantic, temporal, and structural features in isolation and fuse them using simple strategies such as concatenation, which neglects cross-modal interactions and weakens robustness, as obfuscation of a single modality can sharply degrade detection accuracy. To address these challenges, we propose ContractShield, a robust multimodal framework with a novel fusion mechanism that effectively correlates multiple complementary features through a three-level fusion. Self-attention first identifies patterns that indicate vulnerability within each feature space. Cross-modal attention then establishes meaningful connections between complementary signals across modalities. Then, adaptive weighting dynamically calibrates feature contributions based on their reliability under obfuscation. For feature extraction, ContractShield integrates (1) CodeBERT with a sliding window mechanism to capture semantic dependencies in source code, (2) Extended long short-term memory (xLSTM) to model temporal dynamics in opcode sequences, and (3) GATv2 to identify structural invariants in control flow graphs (CFGs) that remain stable across obfuscation. Empirical evaluation demonstrates resilience of ContractShield, achieving a 89 percentage Hamming Score with only a 1-3 percentage drop compared to non-obfuscated data. The framework simultaneously detects five major vulnerability types with 91 percentage F1-score, outperforming state-of-the-art approaches by 6-15 percentage under adversarial conditions.

Problem

Research questions and friction points this paper is trying to address.

smart contract vulnerability detection

code obfuscation

multimodal fusion

cross-modal interaction

robustness

Innovation

Methods, ideas, or system contributions that make the work stand out.

hierarchical cross-modal fusion

adaptive weighting

multimodal vulnerability detection