🤖 AI Summary
Existing database management system (DBMS) and large language model (LLM) integration approaches lack unified design principles, hindering simultaneous optimization of query efficiency, semantic consistency, and system scalability.
Method: We propose a systematic integration framework grounded in design principles, categorizing DBMS+LLM architectures into five canonical patterns. Leveraging architectural taxonomy and analysis of state-of-the-art systems, we identify critical trade-offs—e.g., inference latency versus data freshness—and pinpoint key performance bottlenecks.
Contribution: We introduce the first integration understanding framework tailored to industrial use cases—including enterprise analytics, intelligent customer service, and data-driven decision-making. We formally articulate three open challenges: scalability, execution efficiency, and semantic consistency. Furthermore, we provide theoretical foundations and practical guidelines for the co-evolution of traditional data management and language-based reasoning. (149 words)
📝 Abstract
Modern enterprises are increasingly driven by the DATA+AI paradigm, in which Database Management Systems (DBMSs) and Large Language Models (LLMs) have become two foundational infrastructures powering a wide range of industrial and business applications, such as enterprise analytics, intelligent customer service, and data-driven decision-making. The efficient integration of DBMSs and LLMs within a unified system offers significant opportunities but also introduces new technical challenges. This paper surveys recent developments in DBMS+LLM integration and identifies key future challenges. Specifically, we categorize five representative architectural patterns based on their core design principles, strengths, and trade-offs. Based on this analysis, we further highlight several critical open challenges. We aim to provide a systematic understanding of the current integration landscape and to outline the unresolved issues that must be addressed to achieve scalable and efficient integration of traditional data management and advanced language reasoning in future intelligent applications.