Attention Distance: A Novel Metric for Directed Fuzzing with Large Language Models

📅 2025-12-19

📈 Citations: 0

✨ Influential: 0

career value

253K/year

🤖 AI Summary

Existing directed grey-box fuzzing (DGF) relies on physical path distance between seeds and targets, ignoring semantic logical relationships among code segments—leading to inaccurate targeting and redundant guidance in complex binaries. Method: We propose “attention distance”—the first integration of large language model (LLM) attention mechanisms into fuzzing—to model semantic logical associations among code elements, replacing traditional path-distance metrics. Our approach jointly leverages LLM-based attention analysis, static and dynamic program feature extraction, and is plug-and-play compatible with frameworks such as AFLGo. Contribution/Results: Evaluated on 38 real-world vulnerability reproductions, our method achieves 3.43× average speedup over baseline DGF, outperforming DAFL and WindRanger by 2.89× and 7.13×, respectively. Moreover, it generalizes effectively—enhancing the performance of other state-of-the-art fuzzers through transferable guidance.

Technology Category

Application Category

📝 Abstract

In the domain of software security testing, Directed Grey-Box Fuzzing (DGF) has garnered widespread attention for its efficient target localization and excellent detection performance. However, existing approaches measure only the physical distance between seed execution paths and target locations, overlooking logical relationships among code segments. This omission can yield redundant or misleading guidance in complex binaries, weakening DGF's real-world effectiveness. To address this, we introduce extbf{attention distance}, a novel metric that leverages a large language model's contextual analysis to compute attention scores between code elements and reveal their intrinsic connections. Under the same AFLGo configuration -- without altering any fuzzing components other than the distance metric -- replacing physical distances with attention distances across 38 real vulnerability reproduction experiments delivers a extbf{3.43$ imes$} average increase in testing efficiency over the traditional method. Compared to state-of-the-art directed fuzzers DAFL and WindRanger, our approach achieves extbf{2.89$ imes$} and extbf{7.13$ imes$} improvements, respectively. To further validate the generalizability of attention distance, we integrate it into DAFL and WindRanger, where it also consistently enhances their original performance. All related code and datasets are publicly available at https://github.com/TheBinKing/Attention_Distance.git.

Problem

Research questions and friction points this paper is trying to address.

Measures logical code relationships for better directed fuzzing guidance

Improves testing efficiency over traditional physical distance metrics

Enhances existing directed fuzzer performance with a novel attention metric

Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces attention distance metric using large language models

Replaces physical distances with attention scores for guidance

Integrates into existing fuzzing frameworks without altering components

🔎 Similar Papers

On the Challenges of Fuzzing Techniques via Large Language Models