SemOpt: LLM-Driven Code Optimization via Rule-Based Analysis

📅 2025-10-18

📈 Citations: 0

✨ Influential: 0

career value

165K/year

🤖 AI Summary

Existing code optimization approaches rely on syntactic matching (e.g., BM25), failing to identify semantically equivalent yet syntactically heterogeneous optimization patterns, thereby limiting the effectiveness of LLM-guided optimization. This work proposes SemOpt, a novel framework integrating large language models (LLMs) with static program analysis to enable semantic-level optimization. SemOpt introduces the first LLM-driven Semgrep rule generation mechanism, producing verifiable, semantics-aware optimization rules. It further employs conditional rule matching, clustering-enhanced pattern generalization, and compiler-based validation to precisely locate optimizable code regions and automate safe refactoring. Evaluated on 151 optimization tasks, SemOpt achieves 1.38×–28× higher successful optimization rates than baselines. Across mainstream C/C++ projects, it delivers performance improvements ranging from 5.04% to 218.07%, significantly enhancing both optimization coverage and precision.

Technology Category

Application Category

📝 Abstract

Automated code optimization aims to improve performance in programs by refactoring code, and recent studies focus on utilizing LLMs for the optimization. Typical existing approaches mine optimization commits from open-source codebases to construct a large-scale knowledge base, then employ information retrieval techniques such as BM25 to retrieve relevant optimization examples for hotspot code locations, thereby guiding LLMs to optimize these hotspots. However, since semantically equivalent optimizations can manifest in syntactically dissimilar code snippets, current retrieval methods often fail to identify pertinent examples, leading to suboptimal optimization performance. This limitation significantly reduces the effectiveness of existing optimization approaches. To address these limitations, we propose SemOpt, a novel framework that leverages static program analysis to precisely identify optimizable code segments, retrieve the corresponding optimization strategies, and generate the optimized results. SemOpt consists of three key components: (1) A strategy library builder that extracts and clusters optimization strategies from real-world code modifications. (2) A rule generator that generates Semgrep static analysis rules to capture the condition of applying the optimization strategy. (3) An optimizer that utilizes the strategy library to generate optimized code results. All the three components are powered by LLMs. On our benchmark containing 151 optimization tasks, SemOpt demonstrates its effectiveness under different LLMs by increasing the number of successful optimizations by 1.38 to 28 times compared to the baseline. Moreover, on popular large-scale C/C++ projects, it can improve individual performance metrics by 5.04% to 218.07%, demonstrating its practical utility.

Problem

Research questions and friction points this paper is trying to address.

Retrieving semantically equivalent optimizations from syntactically dissimilar code snippets

Improving LLM-based code optimization performance through rule-based analysis

Addressing limitations of current retrieval methods for code optimization examples

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses static analysis rules to identify optimizable code segments

Builds strategy library from clustered real-world code modifications

Leverages LLMs across all three optimization framework components

🔎 Similar Papers

No similar papers found.