FeatInsight: An Online ML Feature Management System on 4Paradigm Sage-Studio Platform

📅 2025-04-01
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In online machine learning, feature management faces critical challenges including high latency, severe redundancy, difficulty in real-time updates, and inconsistent cross-source dependency resolution. To address these, we propose the first unified feature management framework that integrates feature intelligence with millisecond-scale dynamic updates, enabling consistent real-time computation over trillion-dimensional feature spaces and complex cross-table dependencies. Built upon OpenMLDB, the framework incorporates a declarative feature DSL, incremental lineage tracking, visual diagnostics, and automated validation—covering the full feature lifecycle from design and computation to verification, lineage analysis, and visualization. Deployed across 100+ production scenarios, it reduces feature update latency to the millisecond level, cuts end-to-end latency of sales forecasting services by 70%, and significantly improves model performance and iteration efficiency for recommendation and fraud detection systems.

Technology Category

Application Category

📝 Abstract
Feature management is essential for many online machine learning applications and can often become the performance bottleneck (e.g., taking up to 70% of the overall latency in sales prediction service). Improper feature configurations (e.g., introducing too many irrelevant features) can severely undermine the model's generalization capabilities. However, managing online ML features is challenging due to (1) large-scale, complex raw data (e.g., the 2018 PHM dataset contains 17 tables and dozens to hundreds of columns), (2) the need for high-performance, consistent computation of interdependent features with complex patterns, and (3) the requirement for rapid updates and deployments to accommodate real-time data changes. In this demo, we present FeatInsight, a system that supports the entire feature lifecycle, including feature design, storage, visualization, computation, verification, and lineage management. FeatInsight (with OpenMLDB as the execution engine) has been deployed in over 100 real-world scenarios on 4Paradigm's Sage Studio platform, handling up to a trillion-dimensional feature space and enabling millisecond-level feature updates. We demonstrate how FeatInsight enhances feature design efficiency (e.g., for online product recommendation) and improve feature computation performance (e.g., for online fraud detection). The code is available at https://github.com/4paradigm/FeatInsight.
Problem

Research questions and friction points this paper is trying to address.

Addresses performance bottlenecks in online ML feature management
Manages large-scale complex data for interdependent feature computation
Enables rapid updates for real-time ML feature deployment
Innovation

Methods, ideas, or system contributions that make the work stand out.

Online ML feature lifecycle management system
Handles trillion-dimensional feature space efficiently
Enables millisecond-level feature updates
🔎 Similar Papers
No similar papers found.
X
Xin Tong
Shanghai Jiao Tong Univ.
Xuanhe Zhou
Xuanhe Zhou
Assistant Professor, Shanghai Jiao Tong University
Data ManagementArtificial Intelligence
B
Bingsheng He
National Univ. of Singapore
Guoliang Li
Guoliang Li
Professor, Tsinghua University
DatabaseBig DataCrowdsourcingData Cleaning & Integration
Z
Zirui Tang
Shanghai Jiao Tong Univ.
W
Wei Zhou
Shanghai Jiao Tong Univ.
F
Fan Wu
Shanghai Jiao Tong Univ.
Mian Lu
Mian Lu
4Paradigm Technology
machine learning systemsGPGPUhigh performance computing
Y
Yuqiang Chen
4Paradigm Inc.