MorphingDB: A Task-Centric AI-Native DBMS for Model Management and Inference

๐Ÿ“… 2025-11-26
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Existing AI-native databases face two key bottlenecks: model-centric paradigms require labor-intensive manual configuration, raising development costs; task-centric AutoML approaches incur high computational overhead and exhibit weak integration with DBMSs. This paper introduces the first PostgreSQL-embedded AI-native DBMS, adopting a task-centric architecture that automates deep learning model storage, selection, and inference for time-series, text, and image tasks. We propose novel contributions: a multidimensional tensor data type; a two-stage transfer learning framework; a pre-embedding sharing mechanism; and a DAG-based batched inference pipeline. Implemented via LibTorch extensions, our system supports BLOB-based model storage, feature-aware mapping, vectorized sharing, and cost-aware scheduling. Evaluated on nine public datasets, it achieves 3.2ร— higher inference throughput and reduces GPU memory usage by 57% versus state-of-the-art AI-native DBMSs and AutoML platformsโ€”while maintaining competitive accuracy, latency, and resource efficiency.

Technology Category

Application Category

๐Ÿ“ Abstract
The increasing demand for deep neural inference within database environments has driven the emergence of AI-native DBMSs. However, existing solutions either rely on model-centric designs requiring developers to manually select, configure, and maintain models, resulting in high development overhead, or adopt task-centric AutoML approaches with high computational costs and poor DBMS integration. We present MorphingDB, a task-centric AI-native DBMS that automates model storage, selection, and inference within PostgreSQL. To enable flexible, I/O-efficient storage of deep learning models, we first introduce specialized schemas and multi-dimensional tensor data types to support BLOB-based all-in-one and decoupled model storage. Then we design a transfer learning framework for model selection in two phases, which builds a transferability subspace via offline embedding of historical tasks and employs online projection through feature-aware mapping for real-time tasks. To further optimize inference throughput, we propose pre-embedding with vectoring sharing to eliminate redundant computations and DAG-based batch pipelines with cost-aware scheduling to minimize the inference time. Implemented as a PostgreSQL extension with LibTorch, MorphingDB outperforms AI-native DBMSs (EvaDB, Madlib, GaussML) and AutoML platforms (AutoGluon, AutoKeras, AutoSklearn) across nine public datasets, encompassing series, NLP, and image tasks. Our evaluation demonstrates a robust balance among accuracy, resource consumption, and time cost in model selection and significant gains in throughput and resource efficiency.
Problem

Research questions and friction points this paper is trying to address.

Automates model storage selection and inference within PostgreSQL databases
Reduces development overhead and computational costs in AI-native DBMS
Optimizes inference throughput while maintaining accuracy and efficiency
Innovation

Methods, ideas, or system contributions that make the work stand out.

Automates model storage and inference in PostgreSQL
Uses transfer learning for efficient model selection
Optimizes throughput with pre-embedding and batch pipelines
๐Ÿ”Ž Similar Papers
No similar papers found.
W
Wu Sai
Zhejiang University
X
Xia Ruichen
Zhejiang University
Y
Yang Dingyu
Zhejiang University
W
Wang Rui
Zhejiang University
L
Lai Huihang
Institute of Computing Innovation, Zhejiang University
G
Guan Jiarui
Zhejiang University
B
Bai Jiameng
Zhejiang University
Zhang Dongxiang
Zhang Dongxiang
Zhejiang University
spatial data managementvideo database
T
Tang Xiu
Zhejiang University
X
Xie Zhongle
Zhejiang University
L
Lu Peng
Zhejiang University
C
Chen Gang
Zhejiang University