Reflecting on Empirical and Sustainability Aspects of Software Engineering Research in the Era of Large Language Models

📅 2025-10-30

📈 Citations: 0

✨ Influential: 0

career value

203K/year

🤖 AI Summary

LLM-based software engineering (SE) research faces systemic challenges—including non-rigorous benchmarking, data contamination, poor reproducibility, and high computational costs. Method: We conduct a structured empirical analysis of 87 LLM-based SE papers published at ICSE 2021–2024. Contribution/Results: This is the first systematic study to expose widespread deficiencies in current practice—particularly concerning data isolation, evaluation protocols, code/model reproducibility support, and carbon footprint reporting. Based on these findings, we propose three actionable recommendations: (1) a hierarchical, task-adapted, contamination-resistant benchmarking framework; (2) mandatory disclosure of minimal reproducible units—including dataset slices, prompt templates, and lightweight model checkpoints; and (3) a dual-dimension sustainability evaluation standard integrating computational cost and carbon emissions. Our work establishes a responsible, operational research paradigm for the SE community and informs evidence-based policy development.

Technology Category

Application Category

📝 Abstract

Software Engineering (SE) research involving the use of Large Language Models (LLMs) has introduced several new challenges related to rigour in benchmarking, contamination, replicability, and sustainability. In this paper, we invite the research community to reflect on how these challenges are addressed in SE. Our results provide a structured overview of current LLM-based SE research at ICSE, highlighting both encouraging practices and persistent shortcomings. We conclude with recommendations to strengthen benchmarking rigour, improve replicability, and address the financial and environmental costs of LLM-based SE.

Problem

Research questions and friction points this paper is trying to address.

Addressing benchmarking rigour challenges in LLM-based software engineering research

Improving replicability and contamination issues in empirical SE studies

Mitigating financial and environmental sustainability costs of LLM applications

Innovation

Methods, ideas, or system contributions that make the work stand out.

Structured overview of LLM-based software engineering research

Recommendations to strengthen benchmarking rigour

Address financial and environmental costs of LLMs

🔎 Similar Papers

No similar papers found.