SpecSteer: Synergizing Local Context and Global Reasoning for Efficient Personalized Generation

📅 2026-03-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the key challenge of achieving efficient, high-quality personalized generation by synergizing the personalization capabilities of local small models with the powerful reasoning capacity of cloud-based large models, all while preserving user privacy. The authors propose an asymmetric edge-cloud collaborative inference framework that, for the first time, reformulates speculative decoding as a distributed alignment protocol. By leveraging Bayesian knowledge fusion, the approach securely integrates private user context with cloud-side inference. A novel “draft–verify–recover” pipeline is introduced, incorporating ratio-based verification and intent-guided recovery mechanisms to enable logical validation and intent injection without exposing raw user data. Experiments demonstrate that the method significantly enhances generation quality while maintaining strict privacy guarantees, achieving a 2.36× speedup over baseline approaches.

Technology Category

Application Category

📝 Abstract
Realizing personalized intelligence faces a core dilemma: sending user history to centralized large language models raises privacy concerns, while on-device small language models lack the reasoning capacity required for high-quality generation. Our pilot study shows that purely local enhancements remain insufficient to reliably bridge this gap. We therefore propose SpecSteer, an asymmetric collaborative inference framework that synergizes private on-device context with cloud-scale reasoning. SpecSteer casts collaboration as Bayesian knowledge fusion and repurposes speculative decoding as a distributed alignment protocol, yielding a Draft--Verify--Recover pipeline: the on-device model drafts personalized sequences; the cloud validates via a ratio-based mechanism that decouples reasoning verification from private context, filtering logical flaws without accessing raw user context; upon rejection, a steering recovery injects local intent during correction. Experiments demonstrate that SpecSteer successfully closes the reasoning gap and achieves superior personalized generation performance, while delivering a 2.36x speedup over standard baselines.
Problem

Research questions and friction points this paper is trying to address.

personalized generation
privacy
on-device inference
reasoning capacity
large language models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Speculative Decoding
Bayesian Knowledge Fusion
Asymmetric Collaboration
Privacy-Preserving Inference
On-Device Personalization
🔎 Similar Papers
No similar papers found.
H
Hang Lv
University of Science and Technology of China
Sheng Liang
Sheng Liang
CIS LMU Munich & Munich Center for Machine Learning
NLP
H
Hao Wang
University of Science and Technology of China
Yongyue Zhang
Yongyue Zhang
Nanyang Technological University
H
Hongchao Gu
University of Science and Technology of China
W
Wei Guo
Huawei Technologies Co., Ltd.
D
Defu Lian
University of Science and Technology of China
Yong Liu
Yong Liu
Huawei, NTU, I2R
Recommender SystemsData MiningMachine Learning
Enhong Chen
Enhong Chen
University of Science and Technology of China
data miningrecommender systemmachine learning