Neuromorphic Principles for Efficient Large Language Models on Intel Loihi 2

📅 2025-02-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the high energy consumption and low energy efficiency hindering large language model (LLM) deployment on edge devices, this work proposes the first matrix-multiplication-free (MatMul-free) LLM architecture tailored for Intel Loihi 2 neuromorphic processors. The architecture deeply integrates brain-inspired computing principles with hardware-aware quantization, event-driven sparse computation, state-preserving neural dynamics modeling, and Loihi 2–specific compiler mapping. It enables efficient inference of a 370M-parameter model with zero accuracy loss. Experimental results demonstrate that, compared to a Transformer baseline running on an edge GPU, our approach achieves a 3× throughput improvement and 50% energy reduction, while exhibiting superior scalability. This work constitutes the first empirical validation of spiking neural network (SNN) paradigms for efficient, scalable LLM inference at the edge—establishing both feasibility and clear advantages over conventional deep learning approaches.

Technology Category

Application Category

📝 Abstract
Large language models (LLMs) deliver impressive performance but require large amounts of energy. In this work, we present a MatMul-free LLM architecture adapted for Intel's neuromorphic processor, Loihi 2. Our approach leverages Loihi 2's support for low-precision, event-driven computation and stateful processing. Our hardware-aware quantized model on GPU demonstrates that a 370M parameter MatMul-free model can be quantized with no accuracy loss. Based on preliminary results, we report up to 3x higher throughput with 2x less energy, compared to transformer-based LLMs on an edge GPU, with significantly better scaling. Further hardware optimizations will increase throughput and decrease energy consumption. These results show the potential of neuromorphic hardware for efficient inference and pave the way for efficient reasoning models capable of generating complex, long-form text rapidly and cost-effectively.
Problem

Research questions and friction points this paper is trying to address.

Reduce energy consumption in large language models
Adapt LLM architecture for neuromorphic hardware
Improve throughput and efficiency in edge devices
Innovation

Methods, ideas, or system contributions that make the work stand out.

MatMul-free LLM architecture for Loihi 2
Low-precision event-driven computation utilization
Hardware-aware quantized model with no accuracy loss
🔎 Similar Papers
No similar papers found.