🤖 AI Summary
Large language models (LLMs) face dual challenges in long-context reasoning: limited context windows and degraded long-range performance. Existing retrieval-augmented generation (RAG) approaches—relying on semantic retrieval or knowledge graphs—prioritize factual recall but fail to capture narrative structures wherein entities evolve temporally and spatially across events. To address this, we propose the Generative Semantic Workspace (GSW), a neuroscience-inspired memory framework that introduces episodic memory mechanisms into RAG for the first time. GSW employs Operators to construct spatiotemporally anchored intermediate semantic representations and a Reconciler to dynamically enforce logical, temporal, and spatial consistency within a generative working memory space. Evaluated on the EpBench benchmark, GSW achieves a 20% accuracy gain over state-of-the-art RAG methods while reducing required context tokens by 51%, significantly lowering inference overhead.
📝 Abstract
Large Language Models (LLMs) face fundamental challenges in long-context reasoning: many documents exceed their finite context windows, while performance on texts that do fit degrades with sequence length, necessitating their augmentation with external memory frameworks. Current solutions, which have evolved from retrieval using semantic embeddings to more sophisticated structured knowledge graphs representations for improved sense-making and associativity, are tailored for fact-based retrieval and fail to build the space-time-anchored narrative representations required for tracking entities through episodic events. To bridge this gap, we propose the extbf{Generative Semantic Workspace} (GSW), a neuro-inspired generative memory framework that builds structured, interpretable representations of evolving situations, enabling LLMs to reason over evolving roles, actions, and spatiotemporal contexts. Our framework comprises an extit{Operator}, which maps incoming observations to intermediate semantic structures, and a extit{Reconciler}, which integrates these into a persistent workspace that enforces temporal, spatial, and logical coherence. On the Episodic Memory Benchmark (EpBench) cite{huet_episodic_2025} comprising corpora ranging from 100k to 1M tokens in length, GSW outperforms existing RAG based baselines by up to extbf{20%}. Furthermore, GSW is highly efficient, reducing query-time context tokens by extbf{51%} compared to the next most token-efficient baseline, reducing inference time costs considerably. More broadly, GSW offers a concrete blueprint for endowing LLMs with human-like episodic memory, paving the way for more capable agents that can reason over long horizons.