🤖 AI Summary
This study addresses the lack of systematic empirical analysis on the adoption and evolution of database management systems (DBMSs) in open-source projects. By examining the code history of 362 popular GitHub Java repositories, the work combines source-code heuristics, DB-Engines rankings, ORM detection, and version tracking to uncover long-term DBMS evolution patterns. The findings reveal that MySQL and PostgreSQL are the most prevalent relational DBMSs, while Redis and MongoDB exhibit stable usage among non-relational systems. HyperSQL is frequently replaced, and a “polyglot persistence” pattern—characterized by coexistence and cross-type collaboration of multiple DBMSs—is widespread. Moreover, distinct DBMSs demonstrate significantly different propensities for replacement, highlighting nuanced evolutionary dynamics in real-world software ecosystems.
📝 Abstract
Database Management Systems (DBMSs) are widely used to store, retrieve, and manage the data handled by modern applications. Although prior work has studied the co-evolution of DBMSs and application source code, less is known about DBMS adoption, co-use, and replacement in real systems. This paper presents a historical study of DBMS usage in 362 popular open-source Java projects hosted on GitHub. We investigated the adoption of the top DBMSs ranked by DB-Engines, covering relational and non-relational systems. Using source-code heuristics, we analyzed DBMS popularity, stability, migration patterns, co-occurrence, and the role of Object-Relational Mappers (ORMs). Our findings show that MySQL and PostgreSQL are the most popular DBMSs in our corpus. Among non-relational DBMSs, Redis and MongoDB are the most frequently used and tend to remain stable after adoption. In contrast, systems such as HyperSQL are more often replaced as projects evolve. We also observed frequent co-use of multiple DBMSs, suggesting patterns of polyglot persistence in which projects combine systems to handle different data needs. Finally, we found that ORM frameworks are commonly used to mediate interactions between applications and DBMSs. Overall, our study provides empirical evidence on how DBMSs are adopted, combined, and replaced over time, offering guidance for developers, architects, educators, and DBMS vendors.