🤖 AI Summary
Low-quality log statements—such as ambiguous or misleading ones—obscure actual program behavior and impede software maintenance. Prior work primarily focuses on detecting single log defects and relies on manual fixes. This paper proposes LogFixer, the first automated two-stage framework targeting four real-world log defects: detection and repair. In the offline stage, a lightweight similarity classifier is trained on synthetically defective logs; in the online stage, problematic logs are identified via joint modeling of static textual features and dynamic variable contexts, and semantically appropriate repairs are recommended using large language models (LLMs). LogFixer innovatively integrates a lightweight classifier with LLMs in a synergistic paradigm, ensuring robust detection while enhancing repair validity. Evaluation shows an F1-score of 0.625; adoption rates of static and dynamic repair suggestions improve by 48.12% and 24.90%, respectively; repair suggestion adoption reaches 61.49% on unseen projects; and 40 fixes submitted to GitHub have yielded 25 merged confirmations.
📝 Abstract
Developers use logging statements to monitor software, but misleading logs can complicate maintenance by obscuring actual activities. Existing research on logging quality issues is limited, mainly focusing on single defects and manual fixes. To address this, we conducted a study identifying four defect types in logging statements through real-world log changes analysis. We propose LogFixer, a two-stage framework for automatic detection and updating of logging statements. In the offline stage, LogFixer uses a similarity-based classifier on synthetic defective logs to identify defects. During the online phase, this classifier evaluates logs in code snippets to determine necessary improvements, and an LLM-based recommendation framework suggests updates based on historical log changes. We evaluated LogFixer on real-world and synthetic datasets, and new real-world projects, achieving an F1 score of 0.625. LogFixer significantly improved static text and dynamic variables suggestions by 48.12% and 24.90%, respectively, and achieved a 61.49% success rate in recommending correct updates for new projects. We reported 40 problematic logs to GitHub, resulting in 25 confirmed and merged changes across 11 projects.