"Let it be Chaos in the Plumbing!" Usage and Efficacy of Chaos Engineering in DevOps Pipelines

πŸ“… 2025-09-18
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This study investigates the practical implementation and evolutionary trajectory of Chaos Engineering (CE) within DevOps practices to enhance the resilience of distributed systems in dynamic production environments. Method: A systematic grey literature review is conducted on 50 industrial case studies published between 2019 and 2024, identifying application patterns and implementation mechanisms. Contribution/Results: The study proposes a novel ten-concept classification framework that extends beyond traditional CE principles, emphasizing controlled experimentation, automated execution, and integrated risk mitigation strategies tailored for agile and DevOps contexts. It reveals a paradigm shiftβ€”from ad hoc fault injection toward continuous, pipeline-embedded resilience validation within CI/CD workflows. The framework provides practitioners with a reusable, context-aware implementation guide and advances resilience engineering theory by grounding it in empirical industrial evidence, thereby informing future theoretical development and empirical research.

Technology Category

Application Category

πŸ“ Abstract
Chaos Engineering (CE) has emerged as a proactive method to improve the resilience of modern distributed systems, particularly within DevOps environments. Originally pioneered by Netflix, CE simulates real-world failures to expose weaknesses before they impact production. In this paper, we present a systematic gray literature review that investigates how industry practitioners have adopted and adapted CE principles over recent years. Analyzing 50 sources published between 2019 and early 2024, we developed a comprehensive classification framework that extends the foundational CE principles into ten distinct concepts. Our study reveals that while the core tenets of CE remain influential, practitioners increasingly emphasize controlled experimentation, automation, and risk mitigation strategies to align with the demands of agile and continuously evolving DevOps pipelines. Our results enhance the understanding of how CE is intended and implemented in practice, and offer guidance for future research and industrial applications aimed at improving system robustness in dynamic production environments.
Problem

Research questions and friction points this paper is trying to address.

Investigating adoption of Chaos Engineering in DevOps pipelines
Classifying extended Chaos Engineering principles from industry practices
Assessing efficacy of controlled experimentation for system resilience
Innovation

Methods, ideas, or system contributions that make the work stand out.

Simulates real-world failures proactively
Extends principles into ten concepts
Emphasizes controlled experimentation and automation
πŸ”Ž Similar Papers
No similar papers found.
S
Stefano Fossati
Jheronimus Academy of Data Science, Eindhoven University of Technology, Eindhoven 5612AZ, Netherlands
D
Damian Andrew Tamburri
Department of Engineering, University of Sannio & JADS/NXP Semiconductors, Benevento 82100, Italy
Massimiliano Di Penta
Massimiliano Di Penta
University of Sannio, Italy
Software EngineeringMining Software RepositoriesSoftware EvolutionSBSE
M
Marco Tonnarelli
Jheronimus Academy of Data Science, Eindhoven University of Technology, Eindhoven 5612AZ, Netherlands