deepSURF: Detecting Memory Safety Vulnerabilities in Rust Through Fuzzing LLM-Augmented Harnesses

📅 2025-06-18

📈 Citations: 0

✨ Influential: 0

career value

183K/year

🤖 AI Summary

Rust’s `unsafe` code may introduce memory-safety vulnerabilities, yet existing detection tools inadequately support Rust-specific constructs—such as generics, traits, and macros—and rely heavily on manual intervention. To address this, we propose a synergistic static analysis and large language model (LLM)-guided approach for automated fuzzing harness generation. Our method introduces a novel generic customization replacement mechanism and leverages CodeLlama to dynamically enhance harnesses, enabling realistic user-behavior simulation and exploration of complex API interactions. It integrates precise Rust type resolution, custom type injection, and compatibility with AFL++ and LibFuzzer. Evaluated on 27 real-world crates, our technique successfully reproduced 20 known vulnerabilities and discovered 6 previously unknown ones. It achieves significantly higher vulnerability detection rates compared to state-of-the-art tools, demonstrating both scalability and precision in identifying memory-safety flaws in Rust’s `unsafe` code.

Technology Category

Application Category

📝 Abstract

Although Rust ensures memory safety by default, it also permits the use of unsafe code, which can introduce memory safety vulnerabilities if misused. Unfortunately, existing tools for detecting memory bugs in Rust typically exhibit limited detection capabilities, inadequately handle Rust-specific types, or rely heavily on manual intervention. To address these limitations, we present deepSURF, a tool that integrates static analysis with Large Language Model (LLM)-guided fuzzing harness generation to effectively identify memory safety vulnerabilities in Rust libraries, specifically targeting unsafe code. deepSURF introduces a novel approach for handling generics by substituting them with custom types and generating tailored implementations for the required traits, enabling the fuzzer to simulate user-defined behaviors within the fuzzed library. Additionally, deepSURF employs LLMs to augment fuzzing harnesses dynamically, facilitating exploration of complex API interactions and significantly increasing the likelihood of exposing memory safety vulnerabilities. We evaluated deepSURF on 27 real-world Rust crates, successfully rediscovering 20 known memory safety bugs and uncovering 6 previously unknown vulnerabilities, demonstrating clear improvements over state-of-the-art tools.

Problem

Research questions and friction points this paper is trying to address.

Detect memory safety vulnerabilities in Rust unsafe code

Improve limited capabilities of existing Rust bug detection tools

Handle Rust-specific types and generics effectively in fuzzing

Innovation

Methods, ideas, or system contributions that make the work stand out.

Combines static analysis with LLM-guided fuzzing

Handles generics via custom type substitution

Uses LLMs to dynamically augment fuzzing harnesses

🔎 Similar Papers

On the Challenges of Fuzzing Techniques via Large Language Models