CogMem: A Cognitive Memory Architecture for Sustained Multi-Turn Reasoning in Large Language Models

📅 2025-12-16

📈 Citations: 0

✨ Influential: 0

career value

204K/year

🤖 AI Summary

Large language models (LLMs) suffer from memory decay, reasoning drift, and hallucination in multi-turn dialogues; conventional full-history context retention incurs prohibitive computational overhead and context explosion. To address these issues, we propose a cognitively inspired memory-augmented architecture featuring a novel three-tiered persistent memory system: (1) long-term memory for cross-session strategy consolidation, (2) direct-access memory for session-level note management and long-term knowledge retrieval, and (3) focal attention that dynamically constructs task-specific, compressed contexts. Our approach integrates hierarchical memory modeling, context-aware retrieval, dynamic attention-based context reconstruction, and incremental memory compression and update. Evaluated on TurnBench, it significantly reduces reasoning failure rates, achieves linear—rather than exponential—context-length growth, and improves multi-turn consistency and accuracy, approaching human-level robustness in sequential reasoning.

Technology Category

Application Category

📝 Abstract

Large language models (LLMs) excel at single-turn reasoning but often lose accuracy and coherence over extended, multi-turn interactions. Recent evaluations such as TurnBench highlight recurring failure modes-reasoning bias, task drift, hallucination, overconfidence, and memory decay. Current approaches typically append full conversational histories, causing unbounded context growth, higher computational costs, and degraded reasoning efficiency. We introduce CogMem, a cognitively inspired, memory-augmented LLM architecture that supports sustained iterative reasoning through structured, persistent memory. CogMem incorporates three layers: a Long-Term Memory (LTM) that consolidates cross-session reasoning strategies; a Direct Access (DA) memory that maintains session-level notes and retrieves relevant long-term memories; and a Focus of Attention (FoA) mechanism that dynamically reconstructs concise, task-relevant context at each turn. Experiments on TurnBench show that this layered design mitigates reasoning failures, controls context growth, and improves consistency across extended reasoning chains, moving toward more reliable, human-like reasoning in LLMs.

Problem

Research questions and friction points this paper is trying to address.

Mitigates reasoning failures in multi-turn LLM interactions

Controls unbounded context growth to reduce computational costs

Improves consistency across extended reasoning chains for reliability

Innovation

Methods, ideas, or system contributions that make the work stand out.

Layered memory architecture with LTM, DA, FoA

Dynamic context reconstruction for task relevance

Persistent memory to control context growth

🔎 Similar Papers

Do Large Language Models Latently Perform Multi-Hop Reasoning?