An Empirical Study of Java Code Improvements Based on Stack Overflow Answer Edits

📅 2025-11-08
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Low-quality code in software systems commonly incurs high maintenance costs and accumulates technical debt. To address this, we present the first systematic empirical study analyzing the edit histories of over 140,000 Java answers on Stack Overflow, establishing a traceable link from community knowledge evolution to real-world code improvement. We propose an automated optimization framework grounded in versioned code snippet clone detection, integrating the SOTorrent dataset, an enhanced clone search tool, and validation across GitHub projects. Our analysis reveals that 6.91% of answers underwent multiple revisions, and 49.24% of their edited code snippets are directly applicable to production codebases. We submitted 36 concrete refactoring suggestions to open-source projects; 11 were accepted by maintainers. The key contribution lies in transforming collaboratively evolved Q&A knowledge into a verifiable, actionable mechanism for improving code quality—bridging community-driven insights with measurable software engineering impact.

Technology Category

Application Category

📝 Abstract
Suboptimal code is prevalent in software systems. Developers often write low-quality code due to factors like technical knowledge gaps, insufficient experience, time pressure, management decisions, or personal factors. Once integrated, the accumulation of this suboptimal code leads to significant maintenance costs and technical debt. Developers frequently consult external knowledge bases, such as API documentation and Q&A websites like Stack Overflow (SO), to aid their programming tasks. SO's crowdsourced, collaborative nature has created a vast repository of programming knowledge. Its community-curated content is constantly evolving, with new answers posted or existing ones edited. In this paper, we present an empirical study of SO Java answer edits and their application to improving code in open-source projects. We use a modified code clone search tool to analyze SO code snippets with version history and apply it to open-source Java projects. This identifies outdated or unoptimized code and suggests improved alternatives. Analyzing 140,840 Java accepted answers from SOTorrent and 10,668 GitHub Java projects, we manually categorized SO answer edits and created pull requests to open-source projects with the suggested code improvements. Our results show that 6.91% of SO Java accepted answers have more than one revision (average of 2.82). Moreover, 49.24% of the code snippets in the answer edits are applicable to open-source projects, and 11 out of 36 proposed bug fixes based on these edits were accepted by the GitHub project maintainers.
Problem

Research questions and friction points this paper is trying to address.

Identifying suboptimal Java code in open-source projects using Stack Overflow edits
Analyzing code improvements from Stack Overflow answer version history
Reducing technical debt by suggesting optimized code alternatives
Innovation

Methods, ideas, or system contributions that make the work stand out.

Using modified clone search for code analysis
Applying Stack Overflow edits to open-source projects
Automating pull requests with suggested improvements
🔎 Similar Papers
No similar papers found.
I
In-on Wiratsin
Faculty of Information and Communication Technology, Mahidol University, Nakhon Pathom, Thailand
Chaiyong Ragkhitwetsagul
Chaiyong Ragkhitwetsagul
Assistant Professor, Faculty of ICT, Mahidol University
Software EngineeringMining Software RepositoriesCode SimilarityEmpirical Studies
Matheus Paixao
Matheus Paixao
Assistant Professor (Tenured) at State University of Ceará
Mining Software RepositoriesEmpirical Software EngineeringSearch-based Software Engineering
D
Denis De Sousa
State University of Ceara (UECE), Fortaleza, Brazil
P
Pongpop Lapvikai
Faculty of Information and Communication Technology, Mahidol University, Nakhon Pathom, Thailand
Peter Haddawy
Peter Haddawy
Faculty of Information and Communication Technology, Mahidol University, Nakhon Pathom, Thailand