🤖 AI Summary
This study addresses the emerging yet underexplored sustainability implications of code generated by large language models (LLMs), which—when deployed at scale—can incur significant energy consumption and environmental impact due to inefficiencies. Through a systematic literature review, this work provides the first structured synthesis of existing research, critically examining key dimensions such as prompt engineering, fine-tuning strategies, and energy-efficiency evaluation metrics. The analysis reveals a critical lack of consensus on defining sustainability in this context, alongside the absence of standardized measurement methodologies and benchmarking frameworks. The paper calls for establishing a coherent research paradigm and a unified evaluation framework to guide the development of future LLM-based code generation systems toward greater efficiency and environmental responsibility.
📝 Abstract
Large Language Models (LLMs) are widely used in software engineering to generate, complete, translate, and fix code, improving developer productivity. While most research focuses on the energy consumption and carbon emissions of model training and inference, far less attention has been given to the sustainability of the code these models produce. The efficiency of generated code affects the long-term environmental impact of software systems. Inefficient code can increase CPU usage, memory consumption, execution time, and overall energy use during deployment and operation. As LLM-generated code becomes more common in real-world projects, even small inefficiencies can lead to high environmental costs over time. This paper examines existing research on the sustainability of code generated by LLMs. We conduct a systematic literature review to analyze selected primary studies and investigate the extent to which LLMs are capable of producing sustainable code. In addition, we examine how sustainability is defined and measured in this context, including the metrics and evaluation strategies used to assess energy efficiency and resource usage. We also explore whether techniques such as fine-tuning and prompt engineering influence the sustainability of generated code. Through a structured analysis of the selected studies, we categorize research efforts based on their methodological approaches, evaluation practices, and experimental settings. The findings indicate that research in this area remains relatively limited and fragmented, with no widely accepted framework for measuring or benchmarking the sustainability of LLM-generated code. These observations highlight the need for clearer definitions, standardized evaluation methods, and systematic research to support environmentally friendly AI-assisted software engineering.