FidelityGPT: Correcting Decompilation Distortions with Retrieval Augmented Generation

📅 2025-10-22

📈 Citations: 0

✨ Influential: 0

career value

195K/year

🤖 AI Summary

Decompilation faces challenges of semantic distortion and reduced readability when source code is unavailable, and existing methods lack robustness on complex closed-source binaries. This paper proposes a semantic-fidelity-aware decompilation repair framework: first, it introduces a distortion-aware prompting template and a dynamic semantic intensity algorithm to precisely localize distorted code lines; second, it integrates variable dependency analysis with retrieval-augmented generation (RAG) to alleviate long-context modeling bottlenecks, leveraging a large-scale semantically similar code corpus for contextual enhancement. Experiments on 620 function pairs show 89% distortion detection accuracy, 94% repair success rate, and 64% correction-after-repair rate—significantly outperforming DeGPT. The core contribution lies in the first integration of distortion-aware localization, variable-dependency-driven contextual modeling, and semantic retrieval into decompilation repair.

Technology Category

Application Category

📝 Abstract

Decompilation converts machine code into human-readable form, enabling analysis and debugging without source code. However, fidelity issues often degrade the readability and semantic accuracy of decompiled output. Existing methods, such as variable renaming or structural simplification, provide partial improvements but lack robust detection and correction, particularly for complex closed-source binaries. We present FidelityGPT, a framework that enhances decompiled code accuracy and readability by systematically detecting and correcting semantic distortions. FidelityGPT introduces distortion-aware prompt templates tailored to closed-source settings and integrates Retrieval-Augmented Generation (RAG) with a dynamic semantic intensity algorithm to locate distorted lines and retrieve semantically similar code from a database. A variable dependency algorithm further mitigates long-context limitations by analyzing redundant variables and integrating their dependencies into the prompt context. Evaluated on 620 function pairs from a binary similarity benchmark, FidelityGPT achieved an average detection accuracy of 89% and a precision of 83%. Compared to the state-of-the-art DeGPT (Fix Rate 83%, Corrected Fix Rate 37%), FidelityGPT attained 94% FR and 64% CFR, demonstrating significant gains in accuracy and readability. These results highlight its potential to advance LLM-based decompilation and reverse engineering.

Problem

Research questions and friction points this paper is trying to address.

Correcting semantic distortions in decompiled code

Enhancing decompilation accuracy for closed-source binaries

Improving readability through distortion detection and correction

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses Retrieval-Augmented Generation to correct decompilation distortions

Integrates dynamic semantic intensity algorithm for distortion detection

Employs variable dependency algorithm to mitigate context limitations

🔎 Similar Papers

No similar papers found.