AME: An Efficient Heterogeneous Agentic Memory Engine for Smartphones

📅 2025-11-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Mobile vector databases face dual challenges on smartphones: stringent SoC resource constraints—including limited memory bandwidth, small on-chip memory capacity, and strict data layout requirements—and dynamic workloads featuring high-frequency insertions/deletions and real-time index updates. To address these, this paper proposes HeteroVec, a heterogeneous local vector storage engine tailored for smartphones. HeteroVec introduces a hardware-aware matrix computation pipeline and a query-insertion-index co-scheduling mechanism, integrating lightweight embedding compression, multi-level on-chip memory optimization, and Snapdragon 8-series SoC–specific heterogeneous acceleration. Evaluated on HotpotQA, HeteroVec achieves, at equal recall, a 1.4× improvement in query throughput, a 7× speedup in index construction, and a 6× increase in concurrent insertion throughput—significantly alleviating computational underutilization and concurrency bottlenecks inherent in mobile vector databases.

Technology Category

Application Category

📝 Abstract
On-device agents on smartphones increasingly require continuously evolving memory to support personalized, context-aware, and long-term behaviors. To meet both privacy and responsiveness demands, user data is embedded as vectors and stored in a vector database for fast similarity search. However, most existing vector databases target server-class environments. When ported directly to smartphones, two gaps emerge: (G1) a mismatch between mobile SoC constraints and vector-database assumptions, including tight bandwidth budgets, limited on-chip memory, and stricter data type and layout constraints; and (G2) a workload mismatch, because on-device usage resembles a continuously learning memory, in which queries must coexist with frequent inserts, deletions, and ongoing index maintenance. To address these challenges, we propose AME, an on-device Agentic Memory Engine co-designed with modern smartphone SoCs. AME introduces two key techniques: (1) a hardware-aware, high-efficiency matrix pipeline that maximizes compute-unit utilization and exploits multi-level on-chip storage to sustain high throughput; and (2) a hardware- and workload-aware scheduling scheme that coordinates querying, insertion, and index rebuilding to minimize latency. We implement AME on Snapdragon 8-series SoCs and evaluate it on HotpotQA. In our experiments, AME improves query throughput by up to 1.4x at matched recall, achieves up to 7x faster index construction, and delivers up to 6x higher insertion throughput under concurrent query workloads.
Problem

Research questions and friction points this paper is trying to address.

Addressing mobile SoC constraints for vector databases
Resolving workload mismatch in continuous learning memory systems
Optimizing query and insertion throughput on smartphone hardware
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hardware-aware matrix pipeline for high throughput
Workload-aware scheduling to minimize latency
Co-design with smartphone SoCs for efficiency
🔎 Similar Papers
No similar papers found.
X
Xinkui Zhao
Zhejiang University, Hangzhou, China
Q
Qingyu Ma
Zhejiang University, Hangzhou, China
Y
Yifan Zhang
Zhejiang University, Hangzhou, China
H
Hengxuan Lou
Zhejiang University, Hangzhou, China
Guanjie Cheng
Guanjie Cheng
Assistant Professor, School of Software Technology, Zhejiang University
AIoTMuti-Agent CollaborationEdge ComputingData Security and BlockchainPrivacy Protection
S
Shuiguang Deng
Zhejiang University, Hangzhou, China
Jianwei Yin
Jianwei Yin
Professor of Computer Science and Technology, Zhejiang University
Service ComputingComputer ArchitectureDistributed ComputingAI