Predictive Query Language: A Domain-Specific Language for Predictive Modeling on Relational Databases

📅 2026-02-10

📈 Citations: 0

✨ Influential: 0

career value

163K/year

🤖 AI Summary

This work addresses the cumbersome and error-prone process of manually extracting samples and labels from relational databases for traditional machine learning modeling. To streamline this workflow, the authors propose PQL, a declarative domain-specific language inspired by SQL that enables users to define diverse predictive tasks—including regression, classification, time-series forecasting, and recommendation—through a single query directly over relational databases, with training labels automatically generated. PQL offers two implementations: one optimized for low-latency, small-scale scenarios and another designed for large-scale data processing. The approach has been validated in real-world applications such as financial fraud detection, product recommendation, and load forecasting, demonstrating its versatility, efficiency, and significant improvements in modeling productivity and scalability.

Technology Category

Application Category

📝 Abstract

The purpose of predictive modeling on relational data is to predict future or missing values in a relational database, for example, future purchases of a user, risk of readmission of the patient, or the likelihood that a financial transaction is fraudulent. Typically powered by machine learning methods, predictive models are used in recommendations, financial fraud detection, supply chain optimization, and other systems, providing billions of predictions every day. However, training a machine learning model requires manual work to extract the required training examples - prediction entities and target labels - from the database, which is slow, laborious, and prone to mistakes. Here, we present the Predictive Query Language (PQL), a SQL-inspired declarative language for defining predictive tasks on relational databases. PQL allows specifying a predictive task in a single declarative query, enabling the automatic computation training labels for a large variety of machine learning tasks, such as regression, classification, time-series forecasting, and recommender systems. PQL is already successfully integrated and used in a collection of use cases as part of a predictive AI platform. The versatility of the language can be demonstrated through its many ongoing use cases, including financial fraud, item recommendations, and workload prediction. We demonstrate its versatile design through two implementations; one for small-scale, low-latency use and one that can handle large-scale databases.

Problem

Research questions and friction points this paper is trying to address.

predictive modeling

relational databases

training data extraction

machine learning

declarative language

Innovation

Methods, ideas, or system contributions that make the work stand out.

Predictive Query Language

declarative language

relational databases