Lares: LLM-driven Code Slice Semantic Search for Patch Presence Testing

📅 2025-11-03

📈 Citations: 0

✨ Influential: 0

career value

184K/year

🤖 AI Summary

Detecting patch existence for 1-day vulnerabilities in modern software ecosystems is challenging due to widespread code reuse and the lack of reliable compilation metadata across diverse platforms. Method: This paper proposes a compiler-agnostic, cross-platform, semantically robust binary patch detection method. Its core innovation is the first application of semantic code slicing search to binary patch analysis: leveraging large language models to interpret patch source-code semantics, generating decompiled pseudocode, and verifying logical equivalence via SMT solvers—thereby eliminating dependencies on compilation provenance, optimization levels, instruction-set architectures, or compiler toolchains. Results: Experiments demonstrate that our approach outperforms state-of-the-art techniques across precision, recall, and practical usability. It enables, for the first time, systematic patch-existence evaluation across multiple compilation environments. We publicly release our complete dataset and implementation to foster reproducible research.

Technology Category

Application Category

📝 Abstract

In modern software ecosystems, 1-day vulnerabilities pose significant security risks due to extensive code reuse. Identifying vulnerable functions in target binaries alone is insufficient; it is also crucial to determine whether these functions have been patched. Existing methods, however, suffer from limited usability and accuracy. They often depend on the compilation process to extract features, requiring substantial manual effort and failing for certain software. Moreover, they cannot reliably differentiate between code changes caused by patches or compilation variations. To overcome these limitations, we propose Lares, a scalable and accurate method for patch presence testing. Lares introduces Code Slice Semantic Search, which directly extracts features from the patch source code and identifies semantically equivalent code slices in the pseudocode of the target binary. By eliminating the need for the compilation process, Lares improves usability, while leveraging large language models (LLMs) for code analysis and SMT solvers for logical reasoning to enhance accuracy. Experimental results show that Lares achieves superior precision, recall, and usability. Furthermore, it is the first work to evaluate patch presence testing across optimization levels, architectures, and compilers. The datasets and source code used in this article are available at https://github.com/Siyuan-Li201/Lares.

Problem

Research questions and friction points this paper is trying to address.

Identifies vulnerable functions and patch status in binaries

Overcomes compilation dependency and manual effort limitations

Distinguishes patch-induced changes from compilation variations reliably

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses LLMs for code analysis and semantic search

Employs SMT solvers for logical reasoning

Directly extracts features from patch source code

🔎 Similar Papers

A Systematic Literature Review on Large Language Models for Automated Program Repair