π€ AI Summary
Continuously updating knowledge in large language models faces significant challenges, including limited context length, high computational costs, and fragmented retrieval. This work systematically investigates the potential of Low-Rank Adaptation (LoRA) as a modular, parameterized memory mechanism, presenting the first empirical analysis of LoRAβs design space in terms of knowledge storage capacity, internalization capability, multi-module composition, and long-context reasoning. The study reveals that LoRA serves as a complementary memory paradigm to retrieval-augmented generation (RAG) and in-context learning (ICL), delineating its operational boundaries and practical advantages for modular knowledge integration and efficient updates. These findings establish scalable and composable principles for parameterized knowledge updating in large language models.
π Abstract
Continuous knowledge updating for pre-trained large language models (LLMs) is increasingly necessary yet remains challenging. Although inference-time methods like In-Context Learning (ICL) and Retrieval-Augmented Generation (RAG) are popular, they face constraints in context budgets, costs, and retrieval fragmentation. Departing from these context-dependent paradigms, this work investigates a parametric approach using Low-Rank Adaptation (LoRA) as a modular knowledge memory. Although few recent works examine this concept, the fundamental mechanics governing its capacity and composability remain largely unexplored. We bridge this gap through the first systematic empirical study mapping the design space of LoRA-based memory, ranging from characterizing storage capacity and optimizing internalization to scaling multi-module systems and evaluating long-context reasoning. Rather than proposing a single architecture, we provide practical guidance on the operational boundaries of LoRA memory. Overall, our findings position LoRA as the complementary axis of memory alongside RAG and ICL, offering distinct advantages.