Optimizing Feature Extraction for On-device Model Inference with User Behavior Sequences

📅 2026-03-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the redundant computation bottleneck in feature extraction during on-device model inference by formally modeling the feature extraction pipeline as a directed acyclic graph (DAG) and introducing graph optimization techniques to enable cross-feature fusion and cross-inference caching. The proposed approach effectively eliminates redundant computations both across features and between consecutive inference requests, significantly accelerating the feature preparation phase prior to on-device inference without compromising model accuracy. Deployment and validation across five industrial-scale mobile services—including search, video, and e-commerce—demonstrate substantial end-to-end latency reductions: 1.33×–3.93× during daytime and 1.43×–4.53× at night.

Technology Category

Application Category

📝 Abstract
Machine learning models are widely integrated into modern mobile apps to analyze user behaviors and deliver personalized services. Ensuring low-latency on-device model execution is critical for maintaining high-quality user experiences. While prior research has primarily focused on accelerating model inference with given input features, we identify an overlooked bottleneck in real-world on-device model execution pipelines: extracting input features from raw application logs. In this work, we explore a new direction of feature extraction optimization by analyzing and eliminating redundant extraction operations across different model features and consecutive model inferences. We then introduce AutoFeature, an automated feature extraction engine designed to accelerate on-device feature extraction process without compromising model inference accuracy. AutoFeature comprises three core designs: (1) graph abstraction to formulate the extraction workflows of different input features as one directed acyclic graph, (2) graph optimization to identify and fuse redundant operation nodes across different features within the graph; (3) efficient caching to minimize operations on overlapping raw data between consecutive model inferences. We implement a system prototype of AutoFeature and integrate it into five industrial mobile services spanning search, video and e-commerce domains. Online evaluations show that AutoFeature reduces end-to-end on-device model execution latency by 1.33x-3.93x during daytime and 1.43x-4.53x at night.
Problem

Research questions and friction points this paper is trying to address.

on-device inference
feature extraction
user behavior sequences
latency optimization
redundant operations
Innovation

Methods, ideas, or system contributions that make the work stand out.

feature extraction optimization
on-device inference
redundancy elimination
graph-based optimization
efficient caching
🔎 Similar Papers
No similar papers found.