Build It Clean: Large-Scale Detection of Code Smells in Build Scripts

📅 2025-06-22

📈 Citations: 0

✨ Influential: 0

career value

145K/year

🤖 AI Summary

Code smells in build scripts undermine build reliability and accumulate technical debt. Method: We conduct the first systematic identification and definition of 13 build-script-specific code smells, based on a large-scale empirical study of 5,882 Maven, Gradle, CMake, and Make scripts from 4,877 GitHub open-source projects. We quantitatively detect 10,895 smell instances using our custom static analysis tool Sniffer and qualitatively analyze 2,000 related GitHub issues. Contribution/Results: We uncover co-occurrence patterns among smells (e.g., hard-coded paths strongly correlate with duplicated code), and reveal domain-specific prevalence: Makefiles exhibit the highest smell density; insecure URLs are most frequent in Maven; wildcard usage dominates in Make. Our findings provide empirically grounded, actionable insights for improving build script quality and maintainability.

Technology Category

Application Category

📝 Abstract

Build scripts are files that automate the process of compiling source code, managing dependencies, running tests, and packaging software into deployable artifacts. These scripts are ubiquitous in modern software development pipelines for streamlining testing and delivery. While developing build scripts, practitioners may inadvertently introduce code smells. Code smells are recurring patterns of poor coding practices that may lead to build failures or increase risk and technical debt. The goal of this study is to aid practitioners in avoiding code smells in build scripts through an empirical study of build scripts and issues on GitHub. We employed a mixed-methods approach, combining qualitative and quantitative analysis. We conducted a qualitative analysis of 2000 build-script-related GitHub issues. Next, we developed a static analysis tool, Sniffer, to identify code smells in 5882 build scripts of Maven, Gradle, CMake, and Make files, collected from 4877 open-source GitHub repositories. We identified 13 code smell categories, with a total of 10,895 smell occurrences, where 3184 were in Maven, 1214 in Gradle, 337 in CMake, and 6160 in Makefiles. Our analysis revealed that Insecure URLs were the most prevalent code smell in Maven build scripts, while Hardcoded Paths/URLs were commonly observed in both Gradle and CMake scripts. Wildcard Usage emerged as the most frequent smell in Makefiles. The co-occurrence analysis revealed strong associations between specific smell pairs of Hardcoded Paths/URLs with Duplicates, and Inconsistent Dependency Management with Empty or Incomplete Tags, indicating potential underlying issues in the build script structure and maintenance practices. Based on our findings, we recommend strategies to mitigate the existence of code smells in build scripts to improve the efficiency, reliability, and maintainability of software projects.

Problem

Research questions and friction points this paper is trying to address.

Detect code smells in build scripts to prevent failures

Analyze GitHub issues to identify poor coding practices

Develop tool to improve script reliability and maintainability

Innovation

Methods, ideas, or system contributions that make the work stand out.

Mixed-methods analysis of GitHub issues

Static analysis tool Sniffer for code smells

Identified 13 code smell categories

🔎 Similar Papers

No similar papers found.

ByteDance

圣何塞

Senior Software Engineer, AI Infrastructure - Developer Tooling

ByteDance

西雅图

Authors to Follow