🤖 AI Summary
Cloud-based recommender systems suffer from privacy leakage, limited real-time signal access, and poor scalability, while on-device recommendation is hindered by insufficient computational resources for modeling global sequential patterns and performing efficient candidate retrieval. To address these dual challenges, we propose CDA4Rec—a cloud-edge collaborative dual-agent sequential recommendation framework. Its core contributions are: (1) task decoupling with semantic-driven modular specialization—separating semantic modeling, candidate retrieval, user modeling, and ranking; (2) context-aware, personalized execution planning generated by a cloud-based large language model (LLM), enabling dynamic scheduling and partial parallel collaboration; and (3) a hybrid inference architecture integrating cloud-scale LLMs and lightweight edge models via hierarchical reasoning and resource-aware coordination. Extensive experiments on multiple real-world datasets demonstrate that CDA4Rec significantly outperforms state-of-the-art methods in both recommendation accuracy and inference efficiency, while ensuring privacy preservation, low-latency responsiveness, and adaptive resource utilization.
📝 Abstract
Recent advances in large language models (LLMs) have enabled agent-based recommendation systems with strong semantic understanding and flexible reasoning capabilities. While LLM-based agents deployed in the cloud offer powerful personalization, they often suffer from privacy concerns, limited access to real-time signals, and scalability bottlenecks. Conversely, on-device agents ensure privacy and responsiveness but lack the computational power for global modeling and large-scale retrieval. To bridge these complementary limitations, we propose CDA4Rec, a novel Cloud-Device collaborative framework for sequential Recommendation, powered by dual agents: a cloud-side LLM and a device-side small language model (SLM). CDA4Rec tackles the core challenge of cloud-device coordination by decomposing the recommendation task into modular sub-tasks including semantic modeling, candidate retrieval, structured user modeling, and final ranking, which are allocated to cloud or device based on computational demands and privacy sensitivity. A strategy planning mechanism leverages the cloud agent's reasoning ability to generate personalized execution plans, enabling context-aware task assignment and partial parallel execution across agents. This design ensures real-time responsiveness, improved efficiency, and fine-grained personalization, even under diverse user states and behavioral sparsity. Extensive experiments across multiple real-world datasets demonstrate that CDA4Rec consistently outperforms competitive baselines in both accuracy and efficiency, validating its effectiveness in heterogeneous and resource-constrained environments.