Advancing Multimodal Agent Reasoning with Long-Term Neuro-Symbolic Memory

📅 2026-03-16

📈 Citations: 0

✨ Influential: 0

career value

240K/year

🤖 AI Summary

This work addresses the limitations of existing external memory systems in embodied multimodal agents, which predominantly rely on neural representations and vector retrieval and thus struggle to support analytical and deductive reasoning. To overcome this, the authors propose NS-Mem, a novel framework that integrates neuro-symbolic systems into long-term memory for multimodal agents. NS-Mem features a three-tiered memory architecture encompassing episodic, semantic, and logical rule components, along with a Structured Knowledge Generation mechanism (SK-Gen) and a hybrid retrieval strategy that combines similarity-based matching with symbolic querying. This design enables synergistic collaboration between neural and symbolic mechanisms throughout memory construction, maintenance, and retrieval. Experimental results demonstrate that NS-Mem improves reasoning accuracy by 4.35% on average across real-world multimodal reasoning benchmarks, with gains reaching up to 12.5% on constrained reasoning tasks.

Technology Category

Application Category

📝 Abstract

Recent advances in large language models have driven the emergence of intelligent agents operating in open-world, multimodal environments. To support long-term reasoning, such agents are typically equipped with external memory systems. However, most existing multimodal agent memories rely primarily on neural representations and vector-based retrieval, which are well-suited for inductive, intuitive reasoning but fundamentally limited in supporting analytical, deductive reasoning critical for real-world decision making. To address this limitation, we propose NS-Mem, a long-term neuro-symbolic memory framework designed to advance multimodal agent reasoning by integrating neural memory with explicit symbolic structures and rules. Specifically, NS-Mem is operated around three core components of a memory system: (1) a three-layer memory architecture that consists episodic layer, semantic layer and logic rule layer, (2) a memory construction and maintenance mechanism implemented by SK-Gen that automatically consolidates structured knowledge from accumulated multimodal experiences and incrementally updates both neural representations and symbolic rules, and (3) a hybrid memory retrieval mechanism that combines similarity-based search with deterministic symbolic query functions to support structured reasoning. Experiments on real-world multimodal reasoning benchmarks demonstrate that Neural-Symbolic Memory achieves an average 4.35% improvement in overall reasoning accuracy over pure neural memory systems, with gains of up to 12.5% on constrained reasoning queries, validating the effectiveness of NS-Mem.

Problem

Research questions and friction points this paper is trying to address.

multimodal agent

long-term memory

neuro-symbolic reasoning

deductive reasoning

symbolic representation

Innovation

Methods, ideas, or system contributions that make the work stand out.

neuro-symbolic memory

multimodal agent reasoning

symbolic rules