🤖 AI Summary
Contemporary foundation models suffer from fundamental limitations: frequent hallucinations, shallow conceptual understanding, accountability deficits, poor interpretability, and low energy efficiency. To address these, we propose a neuroscience-inspired tripartite enhancement framework comprising action integration, hierarchical compositional structure, and dynamic episodic memory. This is the first systematic integration of action–generation co-modeling, multi-scale abstract action control, retrievable episodic memory, and hierarchical compositional generation. Grounded in predictive coding theory, our approach incorporates embodied action modeling, hierarchical variational generative networks, and neurocognitively constrained training. The method substantially suppresses hallucinations, deepens causal and conceptual reasoning, enhances behavioral controllability and decision interpretability, and reduces inference energy consumption. It establishes a novel theoretical pathway and scalable architectural foundation for developing next-generation AI systems that are safe, trustworthy, energy-efficient, and endowed with human-like cognitive properties.
📝 Abstract
The phenomenal advances in large language models (LLMs) and other foundation models over the past few years have been based on optimizing large-scale transformer models on the surprisingly simple objective of minimizing next-token prediction loss, a form of predictive coding that is also the backbone of an increasingly popular model of brain function in neuroscience and cognitive science. However, current foundation models ignore three other important components of state-of-the-art predictive coding models: tight integration of actions with generative models, hierarchical compositional structure, and episodic memory. We propose that to achieve safe, interpretable, energy-efficient, and human-like AI, foundation models should integrate actions, at multiple scales of abstraction, with a compositional generative architecture and episodic memory. We present recent evidence from neuroscience and cognitive science on the importance of each of these components. We describe how the addition of these missing components to foundation models could help address some of their current deficiencies: hallucinations and superficial understanding of concepts due to lack of grounding, a missing sense of agency/responsibility due to lack of control, threats to safety and trustworthiness due to lack of interpretability, and energy inefficiency. We compare our proposal to current trends, such as adding chain-of-thought (CoT) reasoning and retrieval-augmented generation (RAG) to foundation models, and discuss new ways of augmenting these models with brain-inspired components. We conclude by arguing that a rekindling of the historically fruitful exchange of ideas between brain science and AI will help pave the way towards safe and interpretable human-centered AI.