Optimizing Feature Extraction for On-device Model Inference with User Behavior Sequences

📅 2026-03-22

📈 Citations: 0

✨ Influential: 0

career value

209K/year

🤖 AI Summary

This work addresses the redundant computation bottleneck in feature extraction during on-device model inference by formally modeling the feature extraction pipeline as a directed acyclic graph (DAG) and introducing graph optimization techniques to enable cross-feature fusion and cross-inference caching. The proposed approach effectively eliminates redundant computations both across features and between consecutive inference requests, significantly accelerating the feature preparation phase prior to on-device inference without compromising model accuracy. Deployment and validation across five industrial-scale mobile services—including search, video, and e-commerce—demonstrate substantial end-to-end latency reductions: 1.33×–3.93× during daytime and 1.43×–4.53× at night.

Technology Category

Application Category

📝 Abstract

Machine learning models are widely integrated into modern mobile apps to analyze user behaviors and deliver personalized services. Ensuring low-latency on-device model execution is critical for maintaining high-quality user experiences. While prior research has primarily focused on accelerating model inference with given input features, we identify an overlooked bottleneck in real-world on-device model execution pipelines: extracting input features from raw application logs. In this work, we explore a new direction of feature extraction optimization by analyzing and eliminating redundant extraction operations across different model features and consecutive model inferences. We then introduce AutoFeature, an automated feature extraction engine designed to accelerate on-device feature extraction process without compromising model inference accuracy. AutoFeature comprises three core designs: (1) graph abstraction to formulate the extraction workflows of different input features as one directed acyclic graph, (2) graph optimization to identify and fuse redundant operation nodes across different features within the graph; (3) efficient caching to minimize operations on overlapping raw data between consecutive model inferences. We implement a system prototype of AutoFeature and integrate it into five industrial mobile services spanning search, video and e-commerce domains. Online evaluations show that AutoFeature reduces end-to-end on-device model execution latency by 1.33x-3.93x during daytime and 1.43x-4.53x at night.

Problem

Research questions and friction points this paper is trying to address.

on-device inference

feature extraction

user behavior sequences

latency optimization

redundant operations

Innovation

Methods, ideas, or system contributions that make the work stand out.

feature extraction optimization

on-device inference

redundancy elimination