🤖 AI Summary
Low-quality code in software systems commonly incurs high maintenance costs and accumulates technical debt. To address this, we present the first systematic empirical study analyzing the edit histories of over 140,000 Java answers on Stack Overflow, establishing a traceable link from community knowledge evolution to real-world code improvement. We propose an automated optimization framework grounded in versioned code snippet clone detection, integrating the SOTorrent dataset, an enhanced clone search tool, and validation across GitHub projects. Our analysis reveals that 6.91% of answers underwent multiple revisions, and 49.24% of their edited code snippets are directly applicable to production codebases. We submitted 36 concrete refactoring suggestions to open-source projects; 11 were accepted by maintainers. The key contribution lies in transforming collaboratively evolved Q&A knowledge into a verifiable, actionable mechanism for improving code quality—bridging community-driven insights with measurable software engineering impact.
📝 Abstract
Suboptimal code is prevalent in software systems. Developers often write low-quality code due to factors like technical knowledge gaps, insufficient experience, time pressure, management decisions, or personal factors. Once integrated, the accumulation of this suboptimal code leads to significant maintenance costs and technical debt. Developers frequently consult external knowledge bases, such as API documentation and Q&A websites like Stack Overflow (SO), to aid their programming tasks. SO's crowdsourced, collaborative nature has created a vast repository of programming knowledge. Its community-curated content is constantly evolving, with new answers posted or existing ones edited. In this paper, we present an empirical study of SO Java answer edits and their application to improving code in open-source projects. We use a modified code clone search tool to analyze SO code snippets with version history and apply it to open-source Java projects. This identifies outdated or unoptimized code and suggests improved alternatives. Analyzing 140,840 Java accepted answers from SOTorrent and 10,668 GitHub Java projects, we manually categorized SO answer edits and created pull requests to open-source projects with the suggested code improvements. Our results show that 6.91% of SO Java accepted answers have more than one revision (average of 2.82). Moreover, 49.24% of the code snippets in the answer edits are applicable to open-source projects, and 11 out of 36 proposed bug fixes based on these edits were accepted by the GitHub project maintainers.