MorphingDB: A Task-Centric AI-Native DBMS for Model Management and Inference

📅 2025-11-26

📈 Citations: 0

✨ Influential: 0

career value

185K/year

🤖 AI Summary

Existing AI-native databases face two key bottlenecks: model-centric paradigms require labor-intensive manual configuration, raising development costs; task-centric AutoML approaches incur high computational overhead and exhibit weak integration with DBMSs. This paper introduces the first PostgreSQL-embedded AI-native DBMS, adopting a task-centric architecture that automates deep learning model storage, selection, and inference for time-series, text, and image tasks. We propose novel contributions: a multidimensional tensor data type; a two-stage transfer learning framework; a pre-embedding sharing mechanism; and a DAG-based batched inference pipeline. Implemented via LibTorch extensions, our system supports BLOB-based model storage, feature-aware mapping, vectorized sharing, and cost-aware scheduling. Evaluated on nine public datasets, it achieves 3.2× higher inference throughput and reduces GPU memory usage by 57% versus state-of-the-art AI-native DBMSs and AutoML platforms—while maintaining competitive accuracy, latency, and resource efficiency.

Technology Category

Application Category

📝 Abstract

The increasing demand for deep neural inference within database environments has driven the emergence of AI-native DBMSs. However, existing solutions either rely on model-centric designs requiring developers to manually select, configure, and maintain models, resulting in high development overhead, or adopt task-centric AutoML approaches with high computational costs and poor DBMS integration. We present MorphingDB, a task-centric AI-native DBMS that automates model storage, selection, and inference within PostgreSQL. To enable flexible, I/O-efficient storage of deep learning models, we first introduce specialized schemas and multi-dimensional tensor data types to support BLOB-based all-in-one and decoupled model storage. Then we design a transfer learning framework for model selection in two phases, which builds a transferability subspace via offline embedding of historical tasks and employs online projection through feature-aware mapping for real-time tasks. To further optimize inference throughput, we propose pre-embedding with vectoring sharing to eliminate redundant computations and DAG-based batch pipelines with cost-aware scheduling to minimize the inference time. Implemented as a PostgreSQL extension with LibTorch, MorphingDB outperforms AI-native DBMSs (EvaDB, Madlib, GaussML) and AutoML platforms (AutoGluon, AutoKeras, AutoSklearn) across nine public datasets, encompassing series, NLP, and image tasks. Our evaluation demonstrates a robust balance among accuracy, resource consumption, and time cost in model selection and significant gains in throughput and resource efficiency.

Problem

Research questions and friction points this paper is trying to address.

Automates model storage selection and inference within PostgreSQL databases

Reduces development overhead and computational costs in AI-native DBMS

Optimizes inference throughput while maintaining accuracy and efficiency

Innovation

Methods, ideas, or system contributions that make the work stand out.

Automates model storage and inference in PostgreSQL

Uses transfer learning for efficient model selection

Optimizes throughput with pre-embedding and batch pipelines

🔎 Similar Papers

NeurDB: On the Design and Implementation of an AI-powered Autonomous Database