🤖 AI Summary
This study addresses the limitations of traditional database index tuning tools, such as Microsoft’s Database Tuning Advisor (DTA), which rely on query optimizers’ “what-if” cost estimates that are often inaccurate and lead to suboptimal recommendations. It presents the first systematic evaluation of large language model (LLM)-driven index tuning on real-world enterprise workloads and industrial benchmarks. The findings reveal that LLMs can effectively capture human-like, intuitive optimization reasoning and significantly outperform DTA across multiple scenarios. However, the research also uncovers key challenges, including high performance variance, limited efficacy when directly integrated into existing systems, and substantial validation costs. These insights provide an empirical foundation and concrete directions for synergistically combining LLMs with conventional tuning tools.
📝 Abstract
Index tuning is critical for the performance of modern database systems. Industrial index tuners, such as the Database Tuning Advisor (DTA) developed for Microsoft SQL Server, rely on the "what-if" API provided by the query optimizer to estimate the cost of a query given an index configuration, which can lead to suboptimal recommendations when the estimations are inaccurate. Large language model (LLM) offers a new approach to index tuning, with knowledge learned from web-scale training datasets. However, the effectiveness of LLM-driven index tuning, especially beyond what is already achieved by commercial index tuners, remains unclear.
In this paper, we study the practical effectiveness of LLM-driven index tuning using both industrial benchmarks and real-world enterprise customer workloads, and compare it with DTA. Our results show that although DTA is generally more reliable, with a few invocations, LLM can identify configurations that significantly outperform those found by DTA in execution time in a considerable number of cases, highlighting its potential as a complementary technique. We also observe that LLM's reasoning captures human-intuitive insights that may be distilled to potentially improve DTA. However, adopting LLM-driven index tuning in production remains challenging due to its substantial performance variance, limited and often negative impact when directly integrated into DTA, and the high cost of performance validation. This work provides motivation, lessons, and practical insights that will inspire future work on LLM-driven index tuning both in academia and industry.