MirrorFuzz: Leveraging LLM and Shared Bugs for Deep Learning Framework APIs Fuzzing

📅 2025-10-17

📈 Citations: 0

✨ Influential: 0

career value

181K/year

🤖 AI Summary

Deep learning frameworks commonly share API designs, leading to cross-framework vulnerability propagation. To address this, we propose MirrorFuzz—the first automated fuzzing framework integrating the shared-vulnerability hypothesis with large language models (LLMs). MirrorFuzz leverages API semantic similarity matching and historical defect pattern analysis to guide LLMs in generating highly targeted test cases, then combines them with dynamic fuzzing for cross-framework vulnerability detection. Evaluated on four major frameworks—TensorFlow, PyTorch, JAX, and MindSpore—it identified 315 vulnerabilities, including 262 previously unreported ones; 80 have been patched, and 52 assigned CNVD identifiers. Moreover, MirrorFuzz improved code coverage by 39.92% and 98.20% on average across frameworks, significantly strengthening security validation for multi-framework interoperability and collaborative development.

Technology Category

Application Category

📝 Abstract

Deep learning (DL) frameworks serve as the backbone for a wide range of artificial intelligence applications. However, bugs within DL frameworks can cascade into critical issues in higher-level applications, jeopardizing reliability and security. While numerous techniques have been proposed to detect bugs in DL frameworks, research exploring common API patterns across frameworks and the potential risks they entail remains limited. Notably, many DL frameworks expose similar APIs with overlapping input parameters and functionalities, rendering them vulnerable to shared bugs, where a flaw in one API may extend to analogous APIs in other frameworks. To address this challenge, we propose MirrorFuzz, an automated API fuzzing solution to discover shared bugs in DL frameworks. MirrorFuzz operates in three stages: First, MirrorFuzz collects historical bug data for each API within a DL framework to identify potentially buggy APIs. Second, it matches each buggy API in a specific framework with similar APIs within and across other DL frameworks. Third, it employs large language models (LLMs) to synthesize code for the API under test, leveraging the historical bug data of similar APIs to trigger analogous bugs across APIs. We implement MirrorFuzz and evaluate it on four popular DL frameworks (TensorFlow, PyTorch, OneFlow, and Jittor). Extensive evaluation demonstrates that MirrorFuzz improves code coverage by 39.92% and 98.20% compared to state-of-the-art methods on TensorFlow and PyTorch, respectively. Moreover, MirrorFuzz discovers 315 bugs, 262 of which are newly found, and 80 bugs are fixed, with 52 of these bugs assigned CNVD IDs.

Problem

Research questions and friction points this paper is trying to address.

Detecting shared bugs across similar APIs in deep learning frameworks

Automating API fuzzing using LLMs and historical bug data

Improving code coverage and bug discovery in DL frameworks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Leverages historical bug data to identify vulnerable APIs

Matches similar APIs across frameworks to find shared bugs

Uses LLMs to synthesize code triggering analogous bugs

🔎 Similar Papers

Enhancing Differential Testing With LLMs For Testing Deep Learning Libraries