🤖 AI Summary
Current database systems have yet to achieve an intelligence leap akin to AlphaGo’s “Move 37,” lacking generative reasoning and creative decision-making capabilities. This work proposes Gen-DBA, a generative database agent that introduces generative intelligence into the AI4DB domain for the first time. Built upon a Transformer backbone, a hardware-aware tokenization mechanism, and a two-stage, goal-directed next-token prediction training paradigm, Gen-DBA is designed to autonomously discover and invent novel optimization strategies. The study delineates a technical pathway toward the “Move 37” moment in database systems, offering a systematic architectural blueprint for AI4DB and establishing the theoretical foundations for generative database agents, thereby inaugurating a new paradigm of autonomous database optimization and innovation.
📝 Abstract
Move\,37 marks one of the major breakthroughs in AI in terms of its ability to surpass human expertise and discover novel strategies beyond the traditional game play in the strategic two-player board game of Go. The domains of Natural Language Processing, Computer Vision, and Robotics have also undergone a similar phenomenon through the advent of large foundational models in the form of Large Language Models (LLMs), Vision Language Models (VLMs) and Vision Language Action models (VLAs), respectively. In this paper, we investigate the current state of Artificial Intelligence for Database Systems research (AI4DB), and assess how far AI4DB systems are from achieving their own Move\,37 moment. We envision a Generative Database Agent (Gen-DBA, for short) as the pathway to achieving Move\,37 for database systems that will bring generative reasoning and creativity into the realm of database learning tasks. This vision paper explores this direction by presenting the recipe for building Gen-DBA that encompasses but is not limited to a Transformer backbone, a hardware-grounded tokenization mechanism, a two-stage Goal-Directed Next Token Prediction training paradigm, and a generative inference process.