InferF: Declarative Factorization of AI/ML Inferences over Joins

📅 2025-11-25

📈 Citations: 0

✨ Influential: 0

career value

221K/year

🤖 AI Summary

In multi-table join scenarios, AI/ML inference suffers from redundant computation due to data duplication across joined relations. Method: This paper introduces the first declarative factorized inference optimization framework tailored for multi-table joins. It decomposes ML inference tasks and pushes computations down the join tree, supporting partial pushdown and factorized execution. We model inference workflows using parseable expressions, design cost-aware greedy and genetic algorithms to automatically search for optimal factorization plans, and integrate the framework with the Velox query engine for efficient execution. Contribution/Results: Evaluated on real-world datasets, our framework achieves up to 11.3× end-to-end speedup. It systematically identifies key factors governing factorized ML performance—such as join cardinality, predicate selectivity, and model complexity—and establishes a new paradigm for in-database ML inference optimization.

Technology Category

Application Category

📝 Abstract

Real-world AI/ML workflows often apply inference computations to feature vectors joined from multiple datasets. To avoid the redundant AI/ML computations caused by repeated data records in the join's output, factorized ML has been proposed to decompose ML computations into sub-computations to be executed on each normalized dataset. However, there is insufficient discussion on how factorized ML could impact AI/ML inference over multi-way joins. To address the limitations, we propose a novel declarative InferF system, focusing on the factorization of arbitrary inference workflows represented as analyzable expressions over the multi-way joins. We formalize our problem to flexibly push down partial factorized computations to qualified nodes in the join tree to minimize the overall inference computation and join costs and propose two algorithms to resolve the problem: (1) a greedy algorithm based on a per-node cost function that estimates the influence on overall latency if a subset of factorized computations is pushed to a node, and (2) a genetic algorithm for iteratively enumerating and evaluating promising factorization plans. We implement InferF on Velox, an open-sourced database engine from Meta, evaluate it on real-world datasets, observed up to 11.3x speedups, and systematically summarized the factors that determine when factorized ML can benefit AI/ML inference workflows.

Problem

Research questions and friction points this paper is trying to address.

Optimizing AI/ML inference over multi-way joins

Minimizing redundant computations in factorized ML

Developing algorithms to reduce inference latency and costs

Innovation

Methods, ideas, or system contributions that make the work stand out.

Declarative system factorizes AI inferences over joins

Greedy and genetic algorithms optimize computation pushdown

Achieves speedups via partial computations on join nodes

🔎 Similar Papers

No similar papers found.