🤖 AI Summary
Traditional software optimization relies on manual tuning and compiler heuristics, exhibiting poor generalizability; existing LLM-based approaches fail to scale to real-world complex systems. Method: We propose the first automated optimization framework that deeply couples large language models (LLMs) with system-level performance diagnosis, integrating multi-dimensional profiling feedback—latency, throughput, energy consumption, and CPU/memory utilization—with cross-language (C++/Java) adaptation mechanisms. Contribution/Results: Our approach transcends the limitations of pure LLM code generation and conventional compiler optimizations by enabling end-to-end, interpretable, and generalizable code-level optimization. Evaluated on large-scale real-world applications, it achieves an average 1.85× latency reduction and 2.24× throughput improvement, significantly outperforming state-of-the-art LLM baselines and mainstream compiler optimizations.
📝 Abstract
Automatic software system optimization can improve software speed, reduce operating costs, and save energy. Traditional approaches to optimization rely on manual tuning and compiler heuristics, limiting their ability to generalize across diverse codebases and system contexts. Recent methods using Large Language Models (LLMs) offer automation to address these limitations, but often fail to scale to the complexity of real-world software systems and applications. We present SysLLMatic, a system that integrates LLMs with profiling-guided feedback and system performance insights to automatically optimize software code. We evaluate it on three benchmark suites: HumanEval_CPP (competitive programming in C++), SciMark2 (scientific kernels in Java), and DaCapoBench (large-scale software systems in Java). Results show that SysLLMatic can improve system performance, including latency, throughput, energy efficiency, memory usage, and CPU utilization. It consistently outperforms state-of-the-art LLM baselines on microbenchmarks. On large-scale application codes, it surpasses traditional compiler optimizations, achieving average relative improvements of 1.85x in latency and 2.24x in throughput. Our findings demonstrate that LLMs, guided by principled systems thinking and appropriate performance diagnostics, can serve as viable software system optimizers. We further identify limitations of our approach and the challenges involved in handling complex applications. This work provides a foundation for generating optimized code across various languages, benchmarks, and program sizes in a principled manner.