🤖 AI Summary
Large language models (LLMs) struggle with global algorithmic innovation for code performance optimization—specifically, minimizing execution time—due to reliance on local iterative refinement. Method: We propose a problem-centric paradigm for constructing optimization data: aggregating diverse solutions from multiple programmers for the same programming task to generate high-quality optimization pairs. To mitigate the “optimization tax,” we introduce an anchor-based verification mechanism that leverages execution feedback to jointly ensure correctness, efficiency, and readability. Our approach integrates multi-source program synthesis, execution-driven validation, LLM fine-tuning, and prompt engineering. Contribution/Results: Experiments demonstrate substantial improvements: optimization success rate increases by 23.5% across multiple benchmarks, and average runtime speedup reaches 2.1×, confirming both efficacy and robustness in real-world code optimization.
📝 Abstract
Large language models (LLMs) have shown remarkable capabilities in solving various programming tasks, such as code generation. However, their potential for code optimization, particularly in performance enhancement, remains largely unexplored. This paper investigates the capabilities of LLMs in optimizing code for minimal execution time, addressing a critical gap in current research. The recently proposed code optimization dataset constructs program optimization pairs based on iterative submissions from the same programmer for the same problem. However, this approach limits LLMs to local performance improvements, neglecting global algorithmic innovation. To overcome this limitation, we adopt a completely different perspective by reconstructing the optimization pairs into a problem-oriented approach. This allows for the integration of various ideas from multiple programmers tackling the same problem. Experimental results demonstrate that adapting LLMs to problem-oriented optimization pairs significantly enhances their optimization capabilities. Furthermore, recognizing the inherent trade-offs in code optimization, we introduce an anchor verification mechanism to mitigate the"optimization tax". Ultimately, our approach elevates both the optimization ratio and speedup to new levels.