Investigating Software Aging in LLM-Generated Software Systems

📅 2025-10-28

📈 Citations: 0

✨ Influential: 0

career value

221K/year

🤖 AI Summary

Long-term reliability issues—such as unbounded memory growth, increasing response latency, and performance instability—remain uncharacterized in software generated by large language models (LLMs). Method: The authors systematically evaluated aging behavior by deploying four service-oriented applications—built using the Bolt platform and standardized Baxbench prompts—under sustained high load for 50 hours, complemented by fine-grained resource monitoring and statistical analysis. Contribution/Results: All tested LLM-generated systems exhibited statistically significant aging phenomena, with severity varying across application types. This work constitutes the first empirical study to identify and quantify software aging in LLM-generated code, thereby establishing a foundational benchmark dataset and providing critical empirical evidence to inform future aging mechanism modeling and mitigation strategies.

Technology Category

Application Category

📝 Abstract

Automatically generated software, especially code produced by Large Language Models (LLMs), is increasingly adopted to accelerate development and reduce manual effort. However, little is known about the long-term reliability of such systems under sustained execution. In this paper, we experimentally investigate the phenomenon of software aging in applications generated by LLM-based tools. Using the Bolt platform and standardized prompts from Baxbench, we generated four service-oriented applications and subjected them to 50-hour load tests. Resource usage, response time, and throughput were continuously monitored to detect degradation patterns. The results reveal significant evidence of software aging, including progressive memory growth, increased response time, and performance instability across all applications. Statistical analyzes confirm these trends and highlight variability in the severity of aging according to the type of application. Our findings show the need to consider aging in automatically generated software and provide a foundation for future studies on mitigation strategies and long-term reliability evaluation.

Problem

Research questions and friction points this paper is trying to address.

Investigating software aging in LLM-generated software systems

Analyzing long-term reliability degradation under sustained execution

Evaluating performance degradation patterns in automatically generated applications

Innovation

Methods, ideas, or system contributions that make the work stand out.

Automatically generated software using LLM-based tools

Load testing for 50 hours to monitor performance degradation

Statistical analysis confirming progressive memory and response issues

🔎 Similar Papers

Towards more realistic evaluation of LLM-based code generation: an experimental study and beyond