🤖 AI Summary
This study addresses the lack of a universal, fine-grained, and empirically validated taxonomy for software energy smells, which hinders effective energy-efficiency optimization. Through a systematic literature review combined with snowball sampling, the authors propose the first programming language–agnostic classification of energy smells, encompassing 12 primary categories and 65 root causes. They further develop an LLM-driven, multi-step analysis pipeline to map these root causes to real-world code instances. Empirical validation on a large-scale dataset of Python code pairs—analyzed for energy consumption, execution time, and memory usage—confirms the practical relevance of 55 root causes. Notably, 71% of the samples exhibit multiple co-occurring smells, and fixes targeting memory-related smells yield the most significant energy savings. The project releases annotated datasets and reasoning traces, demonstrating that energy optimization cannot be reduced to mere performance optimization.
📝 Abstract
As software proliferates across domains, its aggregate energy footprint has become a major concern. To reduce software's growing environmental footprint, developers need to identify and refactor energy smells: source code implementations, design choices, or programming practices that lead to inefficient use of computing resources. Existing catalogs of such smells are either domain-specific, limited to performance anti-patterns, lack fine-grained root cause classification, or remain unvalidated against measured energy data. In this paper, we present a comprehensive, language-agnostic, taxonomy of software energy smells. Through a systematic literature review of 60 papers and exhaustive snowballing, we coded 320 inefficiency patterns into 12 primary energy smells and 65 root causes mapped to the primary smells. To empirically validate this taxonomy, we profile over 21,000 functionally equivalent Python code pairs for energy, time, and memory, and classified the top 3000 pairs by energy difference using a multi-step LLM pipeline, mapping 55 of the 65 root causes to real code. The analysis reveals that 71% of samples exhibit multiple co-occurring smells, memory-related smells yield the highest per-fix energy savings, while power draw variation across patterns confirms that energy optimization cannot be reduced to performance optimization alone. Along with the taxonomy, we release the labeled dataset, including energy profiles and reasoning traces, to the community. Together, they provide a shared vocabulary, actionable refactoring guidelines, and an empirical foundation for energy smell detection, energy-efficient code generation, and green software engineering at large.