Efficient Continual Learning in Language Models via Thalamically Routed Cortical Columns

📅 2026-02-25

📈 Citations: 0

✨ Influential: 0

career value

208K/year

🤖 AI Summary

This work addresses catastrophic forgetting in language models during continual learning on non-stationary data streams by proposing the TRC² architecture, which introduces a brain-inspired thalamocortical structure at the model level for the first time. TRC² employs sparse thalamic routing to connect modular cortical columns and integrates modulation, prediction, memory, and feedback mechanisms within the decoder backbone. A dedicated fast-correction pathway enables efficient online adaptation without perturbing slow parameters. Combined with chunk-parallel training and inference and a reproducible continual learning evaluation suite, TRC² achieves a significantly improved stability-plasticity trade-off at comparable computational cost, enabling rapid adaptation to new knowledge while effectively preserving previously acquired information in streaming domain shifts.

Technology Category

Application Category

📝 Abstract

Continual learning is a core requirement for deployed language models, yet standard training and fine-tuning pipelines remain brittle under non-stationary data. Online updates often induce catastrophic forgetting, while methods that improve stability frequently increase latency, memory footprint, or dense computation in ways that do not scale well to long contexts. We introduce TRC$^{2}$ (Thalamically Routed Cortical Columns), a decoder-only backbone that addresses continual learning at the architectural level. TRC$^{2}$ combines sparse thalamic routing over cortical columns with mechanisms for modulation, prediction, memory, and feedback, together with a fast corrective pathway that supports rapid adaptation without destabilizing slower parameters. The resulting block is sparse and chunk-parallel, enabling efficient training and inference while preserving clean ablations of each subsystem. We instantiate a reproducible training and evaluation stack and a continual-learning harness that measures proxy forgetting under streaming domain shifts. Across language modeling and continual learning benchmarks, TRC$^{2}$ improves the stability-plasticity tradeoff at comparable compute, enabling rapid on-stream adaptation while preserving previously acquired behavior.

Problem

Research questions and friction points this paper is trying to address.

continual learning

catastrophic forgetting

language models

non-stationary data

stability-plasticity tradeoff

Innovation

Methods, ideas, or system contributions that make the work stand out.

continual learning

sparse routing

thalamocortical architecture