Chaos Engineering in the Wild: Findings from GitHub

📅 2025-05-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the lack of empirical understanding regarding the real-world adoption and evolution of chaos engineering tools in open-source ecosystems. We conduct the first large-scale empirical investigation, analyzing code, metadata, and temporal patterns across 971 GitHub repositories using 10 mainstream chaos engineering tools. Our method integrates quantitative repository analysis with layered fault-injection categorization (network, instance, and application layers) and lifecycle-stage mapping. Results reveal that Toxiproxy and Chaos Mesh dominate cloud-native contexts; fault injection is heavily concentrated at the network (40.9%) and instance (32.7%) layers, while application-layer injection remains critically underutilized (3.0%); and 58% of repositories apply chaos engineering during software development—not just production. This work bridges an industrial knowledge gap on practical chaos engineering deployment, uncovers systemic resilience deficits—particularly in application-layer fault modeling—and provides evidence-based guidance for tool design, practice standardization, and test strategy enhancement.

Technology Category

Application Category

📝 Abstract
Chaos engineering aims to improve the resilience of software systems by intentionally injecting faults to identify and address system weaknesses that cause outages in production environments. Although many tools for chaos engineering exist, their practical adoption is not yet explored. This study examines 971 GitHub repositories that incorporate 10 popular chaos engineering tools to identify patterns and trends in their use. The analysis reveals that Toxiproxy and Chaos Mesh are the most frequently used, showing consistent growth since 2016 and reflecting increasing adoption in cloud-native development. The release of new chaos engineering tools peaked in 2018, followed by a shift toward refinement and integration, with Chaos Mesh and LitmusChaos leading in ongoing development activity. Software development is the most frequent application (58.0%), followed by unclassified purposes (16.2%), teaching (10.3%), learning (9.9%), and research (5.7%). Development-focused repositories tend to have higher activity, particularly for Toxiproxy and Chaos Mesh, highlighting their industrial relevance. Fault injection scenarios mainly address network disruptions (40.9%) and instance termination (32.7%), while application-level faults remain underrepresented (3.0%), highlighting for future exploration.
Problem

Research questions and friction points this paper is trying to address.

Exploring practical adoption of chaos engineering tools on GitHub
Identifying trends in chaos engineering tool usage since 2016
Analyzing fault injection scenarios and underrepresented application-level faults
Innovation

Methods, ideas, or system contributions that make the work stand out.

Analyzes 971 GitHub repositories using chaos tools
Identifies Toxiproxy and Chaos Mesh as top tools
Focuses on network and instance fault injection
🔎 Similar Papers
No similar papers found.
J
Joshua Owotogbe
Jheronimus Academy of Data Science, 's-Hertogenbosch, North Brabant, Netherlands
I
Indika Kumara
Jheronimus Academy of Data Science, 's-Hertogenbosch, North Brabant, Netherlands
Dario Di Nucci
Dario Di Nucci
Associate Professor, University of Salerno, Italy
Software EngineeringData ScienceDevOps
D
D. Tamburri
Jheronimus Academy of Data Science, 's-Hertogenbosch, North Brabant, Netherlands and University of Sannio, Benevento, Italy
W
Willem-jan Van Den Heuvel
Jheronimus Academy of Data Science, 's-Hertogenbosch, North Brabant, Netherlands