Code Smell Detection via Pearson Correlation and ML Hyperparameter Optimization

📅 2025-10-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address low accuracy and poor cross-dataset generalizability in code smell detection for large-scale software systems, this paper proposes a machine learning framework integrating data balancing, feature selection, and hyperparameter optimization. Specifically, it employs SMOTE for minority-class oversampling, Pearson correlation–based feature filtering, and three hyperparameter tuning strategies—grid search, random search, and Bayesian optimization—to systematically evaluate eight mainstream classifiers, including XGBoost, AdaBoost, and Random Forest. Experimental results demonstrate that AdaBoost achieves 100% accuracy, while XGBoost and Random Forest attain 99%; all three significantly outperform baseline methods across precision, recall, F1-score, and AUC. This work constitutes the first systematic validation of multi-strategy collaborative optimization in code smell detection, establishing a novel paradigm for high-accuracy, robustly generalizable, and reproducible automated software quality analysis.

Technology Category

Application Category

📝 Abstract
This study addresses the challenge of detecting code smells in large-scale software systems using machine learning (ML). Traditional detection methods often suffer from low accuracy and poor generalization across different datasets. To overcome these issues, we propose a machine learning-based model that automatically and accurately identifies code smells, offering a scalable solution for software quality analysis. The novelty of our approach lies in the use of eight diverse ML algorithms, including XGBoost, AdaBoost, and other classifiers, alongside key techniques such as the Synthetic Minority Over-sampling Technique (SMOTE) for class imbalance and Pearson correlation for efficient feature selection. These methods collectively improve model accuracy and generalization. Our methodology involves several steps: first, we preprocess the data and apply SMOTE to balance the dataset; next, Pearson correlation is used for feature selection to reduce redundancy; followed by training eight ML algorithms and tuning hyperparameters through Grid Search, Random Search, and Bayesian Optimization. Finally, we evaluate the models using accuracy, F-measure, and confusion matrices. The results show that AdaBoost, Random Forest, and XGBoost perform best, achieving accuracies of 100%, 99%, and 99%, respectively. This study provides a robust framework for detecting code smells, enhancing software quality assurance, and demonstrating the effectiveness of a comprehensive, optimized ML approach.
Problem

Research questions and friction points this paper is trying to address.

Detecting code smells in large-scale software systems using machine learning
Addressing low accuracy and poor generalization in traditional detection methods
Optimizing ML models with feature selection and hyperparameter tuning techniques
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses Pearson correlation for feature selection
Applies SMOTE technique to handle class imbalance
Optimizes hyperparameters with Grid and Bayesian Search
🔎 Similar Papers
No similar papers found.
M
Moinuddin Muhammad Imtiaz Bhuiyan
Dept. of Computer Science and Engineering, East Delta University, Chattogram, Bangladesh
K
Kazi Ekramul Hoque
Dept. of Computer Science and Engineering, East Delta University, Chattogram, Bangladesh
R
Rakibul Islam
Dept. of Computer Science and Engineering, East Delta University, Chattogram, Bangladesh
M
Md. Mahbubur Rahman Tusher
Dept. of Computer Science and Engineering, Bangladesh Army University of Science and Technology, Saidpur, Bangladesh
Najmul Hassan
Najmul Hassan
University of Oregon
Computer VisionDeep Learning
Y
Yoichi Tomioka
Dept. of Computer Science and Engineering, The University of Aizu, Aizuwakamatsu, Japan
S
Satoshi Nishimura
Dept. of Computer Science and Engineering, The University of Aizu, Aizuwakamatsu, Japan
J
Jungpil Shin
Dept. of Computer Science and Engineering, The University of Aizu, Aizuwakamatsu, Japan
A
Abu Saleh Musa Miah
Dept. of Computer Science and Engineering, The University of Aizu, Aizuwakamatsu, Japan