🤖 AI Summary
Existing research lacks systematic empirical analysis of database selection and co-management strategies in microservice architectures. Method: We conducted a large-scale empirical study of 1,000 open-source GitHub projects (2008–2023), covering 14 database categories and 180 distinct database systems, employing open-data collection, multi-dimensional classification, and statistical modeling. Contribution/Results: This is the first study to empirically characterize the evolutionary patterns of database usage in microservices: 52% of systems adopt cross-category database combinations, with architectural complexity increasing significantly with the number of databases; newer systems prefer key-value and document databases, whereas legacy systems predominantly rely on relational databases; mainstream database types (relational, key-value, document, search) are frequently deployed alongside niche databases. The study yields 18 empirically grounded findings and 9 actionable engineering recommendations, providing both theoretical foundations and practical guidance for heterogeneous distributed data management in microservice environments.
📝 Abstract
Microservices architectures are an integral part of modern software development. Their adoption brings significant changes to database management. Instead of relying on a single database, a microservices architecture is typically composed of multiple, smaller, heterogeneous, and distributed DBs. In these data-intensive systems, the variety and combination of database categories and technologies play a crucial role in storing and managing data. While data management in microservices is a major challenge, research literature is scarce.
We present an empirical study on how databases are used in microservices. On the dataset we collected (and released as open data for future research), considering 15 years of microservices, we examine ca. 1,000 GitHub projects that use databases selected among 180 technologies from 14 categories. We perform a comprehensive analysis of current practices, providing researchers and practitioners with empirical evidence to better understand database usage in microservices. We report 18 findings and 9 recommendations. We show that microservices predominantly use Relational, Key-Value, Document, and Search databases. Notably, 52% of microservices combine multiple database categories. Complexity correlates with database count, with older systems favoring Relational databases and newer ones increasingly adopting Key-Value and Document technologies. Niche databases (e.g., EventStoreDB, PostGIS), while not widespread, are often combined with a mainstream one.