🤖 AI Summary
Existing robotic control approaches struggle to replicate key biological motor characteristics such as dynamic stability, millisecond-scale reflexes, and temporal memory. This work proposes NeuroVLA, a novel framework that, for the first time, deploys a brain-inspired vision–language–action system on a physical robot. By emulating the hierarchical neural architecture of the cortex–cerebellum–spinal cord pathway, NeuroVLA enables seamless coordination among high-level planning, high-frequency feedback stabilization, and rapid motor execution. Remarkably, without requiring additional training data, the system naturally exhibits biomimetic behaviors—including jitter suppression, energy-efficient operation, temporal memory, and safety-critical reflexes within 20 milliseconds. Evaluated on a real-world robotic platform, NeuroVLA achieves state-of-the-art performance while consuming only 0.4 watts of power.
📝 Abstract
Recent advances in embodied intelligence have leveraged massive scaling of data and model parameters to master natural-language command following and multi-task control. In contrast, biological systems demonstrate an innate ability to acquire skills rapidly from sparse experience. Crucially, current robotic policies struggle to replicate the dynamic stability, reflexive responsiveness, and temporal memory inherent in biological motion. Here we present Neuromorphic Vision-Language-Action (NeuroVLA), a framework that mimics the structural organization of the bio-nervous system between the cortex, cerebellum, and spinal cord. We adopt a system-level bio-inspired design: a high-level model plans goals, an adaptive cerebellum module stabilizes motion using high-frequency sensors feedback, and a bio-inspired spinal layer executes lightning-fast actions generation. NeuroVLA represents the first deployment of a neuromorphic VLA on physical robotics, achieving state-of-the-art performance. We observe the emergence of biological motor characteristics without additional data or special guidance: it stops the shaking in robotic arms, saves significant energy(only 0.4w on Neuromorphic Processor), shows temporal memory ability and triggers safety reflexes in less than 20 milliseconds.