The Dragon Hatchling: The Missing Link between the Transformer and Models of the Brain

📅 2025-09-30

📈 Citations: 0

✨ Influential: 0

career value

245K/year

🤖 AI Summary

How can large language models achieve Transformer-level performance while maintaining biological interpretability and neuroscientific plausibility? This paper introduces Dragon Hatchling (BDH), a brain-inspired architecture that constructs scale-free biological networks from locally interacting neuronal particles, integrating spiking neuron dynamics, Hebbian synaptic plasticity, and attention-guided state-space modeling for GPU-efficient training. Its key contribution lies in unifying three biologically grounded properties—unambiguous neuronal activation, synapse-level dynamic traceability, and modular higher-order graph structure—within a single language modeling framework, thereby ensuring both theoretical rigor and intrinsic interpretability. Evaluated across model sizes from 10M to 1B parameters, BDH matches GPT-2’s performance on language modeling and machine translation tasks using identical training data, demonstrating scalability and biological fidelity. BDH establishes a novel paradigm bridging computational linguistics with cognitive neuroscience.

Technology Category

Application Category

📝 Abstract

The relationship between computing systems and the brain has served as motivation for pioneering theoreticians since John von Neumann and Alan Turing. Uniform, scale-free biological networks, such as the brain, have powerful properties, including generalizing over time, which is the main barrier for Machine Learning on the path to Universal Reasoning Models. We introduce `Dragon Hatchling' (BDH), a new Large Language Model architecture based on a scale-free biologically inspired network of $n$ locally-interacting neuron particles. BDH couples strong theoretical foundations and inherent interpretability without sacrificing Transformer-like performance. BDH is a practical, performant state-of-the-art attention-based state space sequence learning architecture. In addition to being a graph model, BDH admits a GPU-friendly formulation. It exhibits Transformer-like scaling laws: empirically BDH rivals GPT2 performance on language and translation tasks, at the same number of parameters (10M to 1B), for the same training data. BDH can be represented as a brain model. The working memory of BDH during inference entirely relies on synaptic plasticity with Hebbian learning using spiking neurons. We confirm empirically that specific, individual synapses strengthen connection whenever BDH hears or reasons about a specific concept while processing language inputs. The neuron interaction network of BDH is a graph of high modularity with heavy-tailed degree distribution. The BDH model is biologically plausible, explaining one possible mechanism which human neurons could use to achieve speech. BDH is designed for interpretability. Activation vectors of BDH are sparse and positive. We demonstrate monosemanticity in BDH on language tasks. Interpretability of state, which goes beyond interpretability of neurons and model parameters, is an inherent feature of the BDH architecture.

Problem

Research questions and friction points this paper is trying to address.

Developing a biologically plausible brain model that rivals Transformer performance

Creating an interpretable AI architecture using scale-free neural networks

Bridging the gap between artificial intelligence and biological brain mechanisms

Innovation

Methods, ideas, or system contributions that make the work stand out.

Scale-free biologically inspired network of neuron particles

GPU-friendly graph model with Transformer-like scaling laws

Interpretable architecture using synaptic plasticity and spiking neurons

🔎 Similar Papers

No similar papers found.