MemLoRA: Distilling Expert Adapters for On-Device Memory Systems

📅 2025-12-04

📈 Citations: 0

✨ Influential: 0

career value

214K/year

🤖 AI Summary

To address the high computational cost of large language models (LLMs) in local deployment, the weak memory retention of small language models (SLMs), and their lack of native visual support, this paper proposes MemLoRA—a modular, LoRA-based memory adapter framework that introduces knowledge distillation into memory module design for the first time, decoupling memory extraction, updating, and generation. We further extend it to MemLoRA-V, which natively integrates a lightweight vision-language model to enable on-device multimodal memory augmentation. Our approach significantly reduces parameter count and computational overhead without requiring fine-tuning of the backbone model. Experiments demonstrate that MemLoRA outperforms a baseline with 10× more parameters on the LoCoMo benchmark, approaching the performance of a model 60× larger. MemLoRA-V achieves 81.3% accuracy on visual question answering, substantially surpassing caption-based baselines (23.7%).

Technology Category

Application Category

📝 Abstract

Memory-augmented Large Language Models (LLMs) have demonstrated remarkable consistency during prolonged dialogues by storing relevant memories and incorporating them as context. Such memory-based personalization is also key in on-device settings that allow users to keep their conversations and data private. However, memory-augmented systems typically rely on LLMs that are too costly for local on-device deployment. Even though Small Language Models (SLMs) are more suitable for on-device inference than LLMs, they cannot achieve sufficient performance. Additionally, these LLM-based systems lack native visual capabilities, limiting their applicability in multimodal contexts. In this paper, we introduce (i) MemLoRA, a novel memory system that enables local deployment by equipping SLMs with specialized memory adapters, and (ii) its vision extension MemLoRA-V, which integrates small Vision-Language Models (SVLMs) to memory systems, enabling native visual understanding. Following knowledge distillation principles, each adapter is trained separately for specific memory operations$unicode{x2013}$knowledge extraction, memory update, and memory-augmented generation. Equipped with memory adapters, small models enable accurate on-device memory operations without cloud dependency. On text-only operations, MemLoRA outperforms 10$ imes$ larger baseline models (e.g., Gemma2-27B) and achieves performance comparable to 60$ imes$ larger models (e.g., GPT-OSS-120B) on the LoCoMo benchmark. To evaluate visual understanding operations instead, we extend LoCoMo with challenging Visual Question Answering tasks that require direct visual reasoning. On this, our VLM-integrated MemLoRA-V shows massive improvements over caption-based approaches (81.3 vs. 23.7 accuracy) while keeping strong performance in text-based tasks, demonstrating the efficacy of our method in multimodal contexts.

Problem

Research questions and friction points this paper is trying to address.

Enables on-device memory systems for small language models.

Adds native visual understanding to memory-augmented language models.

Improves performance of small models to match larger models.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses specialized memory adapters for small language models

Distills knowledge for memory operations via separate adapter training

Integrates small vision-language models for native visual understanding

🔎 Similar Papers

No similar papers found.