๐ค AI Summary
To address the high computational cost and deployment challenges of large language models (LLMs) in automated program repair (APR), this work systematically evaluates the repair capability of small language models (SLMs) under resource-constrained settings. Using the QuixBugs benchmark, we empirically assess multiple state-of-the-art SLMs and apply INT8 quantization to reduce memory footprint. Results show that the best-performing SLM achieves repair accuracy comparable to mainstream LLMsโup to 68.2%โwithout quantization; after INT8 quantization, accuracy remains unchanged while GPU memory usage decreases by ~50% and inference latency drops significantly. This study is the first to demonstrate that carefully selected and lightweight-optimized SLMs can jointly achieve high efficiency and effectiveness in APR, establishing a viable new paradigm for code intelligence in edge and low-resource environments.
๐ Abstract
Background: Large language models (LLMs) have greatly improved the accuracy of automated program repair (APR) methods. However, LLMs are constrained by high computational resource requirements. Aims: We focus on small language models (SLMs), which perform well even with limited computational resources compared to LLMs. We aim to evaluate whether SLMs can achieve competitive performance in APR tasks. Method: We conducted experiments on the QuixBugs benchmark to compare the bug-fixing accuracy of SLMs and LLMs. We also analyzed the impact of int8 quantization on APR performance. Results: The latest SLMs can fix bugs as accurately as--or even more accurately than--LLMs. Also, int8 quantization had minimal effect on APR accuracy while significantly reducing memory requirements. Conclusions: SLMs present a viable alternative to LLMs for APR, offering competitive accuracy with lower computational costs, and quantization can further enhance their efficiency without compromising effectiveness.