🤖 AI Summary
TODO comments in open-source projects suffer from pervasive low quality (46.7% are vague, information-deficient, or lack practical utility) and chronic lack of resolution, demanding systematic governance. This study first proposes a multidimensional high-quality TODO criterion, empirically derived from lifecycle analysis and management practice comparison of 2,863 TODOs across GitHub’s Top 100 Java repositories. We then develop the first CodeBERT-based fine-tuned model for TODO quality assessment, achieving an F1-score of 0.89 on binary classification. Finally, we deliver actionable writing guidelines and governance recommendations. Our core contributions are threefold: (1) theoretically, the first comprehensive TODO quality assessment framework; (2) methodologically, the first deep learning–driven automated quality identification system; and (3) practically, community-adoptable, evidence-based pathways for improving TODO quality in open-source development.
📝 Abstract
Software development is a collaborative process that involves various interactions among individuals and teams. TODO comments in source code play a critical role in managing and coordinating diverse tasks during this process. However, this study finds that a large proportion of open-source project TODO comments are left unresolved or take a long time to be resolved. About 46.7% of TODO comments in open-source repositories are of low-quality (e.g., TODOs that are ambiguous, lack information, or are useless to developers). This highlights the need for better TODO practices. In this study, we investigate four aspects regarding the quality of TODO comments in open-source projects: (1) the prevalence of low-quality TODO comments; (2) the key characteristics of high-quality TODO comments; (3) how are TODO comments of different quality managed in practice; and (4) the feasibility of automatically assessing TODO comment quality. Examining 2,863 TODO comments from Top100 GitHub Java repositories, we propose criteria to identify high-quality TODO comments and provide insights into their optimal composition. We discuss the lifecycle of TODO comments with varying quality. To assist developers, we construct deep learning-based methods that show promising performance in identifying the quality of TODO comments, potentially enhancing development efficiency and code quality.