🤖 AI Summary
This work addresses the challenge of manually repairing implicit data loss compiler warnings in industrial-scale C++ projects by proposing an automated approach based on large language models (LLMs). The method leverages the Language Server Protocol and Tree-sitter to precisely extract code context, enabling intelligent decisions on whether range checks are necessary and generating performance-conscious repair code. As the first study to apply LLMs to this specific problem, the system achieves a 92.73% human-review acceptance rate on real-world projects, reduces auxiliary instruction overhead by 39.09% compared to baseline methods, and produces fixes whose runtime performance closely approximates that of optimal human-written solutions—differing by only 13.56%.
📝 Abstract
This paper presents a method to automatically fix implicit data loss warnings in large C++ projects using Large Language Models (LLMs). Our approach uses the Language Server Protocol (LSP) to gather context, Tree-sitter to extract relevant code, and LLMs to make decisions and generate fixes. The method evaluates the necessity of range checks concerning performance implications and generates appropriate fixes. We tested this method in a large C++ project, resulting in a 92.73% acceptance rate of the fixes by human developers during the code review. Our LLM-generated fixes reduced the number of warning fix changes that introduced additional instructions due to range checks and exception handling by 39.09% compared to a baseline fix strategy. This result was 13.56% behind the optimal solutions created by human developers. These findings demonstrate that our LLM-based approach can reduce the manual effort to address compiler warnings while maintaining code quality and performance in a real-world scenario. Our automated approach shows promise for integration into existing development workflows, potentially improving code maintenance practices in complex C++ software projects.