FAS: Fast ANN-SNN Conversion for Spiking Large Language Models

📅 2025-02-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing Spiking Large Language Models (SLLMs) suffer from performance degradation and high computational overhead. This paper proposes the first two-stage, ANN-to-SNN conversion framework tailored for multi-scale LLMs: Stage I performs full-parameter fine-tuning; Stage II employs coarse-to-fine calibration—avoiding training from scratch while substantially mitigating accuracy loss. The framework achieves state-of-the-art (SOTA) performance on both language and vision-language tasks across four mainstream architectures, including OPT-7B. With only eight time steps, it surpasses the original ANN baselines by 3% in accuracy on multiple benchmarks, reduces energy consumption by 96.63%, and significantly improves inference speed and energy efficiency. Its core innovation lies in a unified design that jointly optimizes high accuracy, low temporal latency, and temporally sparse inference.

Technology Category

Application Category

📝 Abstract
Spiking Large Language Models have been shown as a good alternative to LLMs in various scenarios. Existing methods for creating Spiking LLMs, i.e., direct training and ANN-SNN conversion, often suffer from performance degradation and relatively high computational costs. To address these issues, we propose a novel Fast ANN-SNN conversion strategy (FAS) that transforms LLMs into spiking LLMs in two stages. The first stage employs a full-parameter fine-tuning of pre-trained models, so it does not need any direct training from scratch. The second stage introduces a coarse-to-fine calibration method to reduce conversion errors and improve accuracy. Our experiments on both language and vision-language tasks across four different scales of LLMs demonstrate that FAS can achieve state-of-the-art performance yet with significantly reduced inference latency and computational costs. For example, FAS only takes 8 timesteps to achieve an accuracy of 3% higher than that of the OPT-7B model, while reducing energy consumption by 96.63%.
Problem

Research questions and friction points this paper is trying to address.

Fast ANN-SNN conversion for Spiking LLMs
Reduce computational costs and latency
Improve accuracy with coarse-to-fine calibration
Innovation

Methods, ideas, or system contributions that make the work stand out.

Fast ANN-SNN conversion strategy
Full-parameter fine-tuning pre-trained models
Coarse-to-fine calibration method
🔎 Similar Papers
No similar papers found.
L
Long Chen
School of Computer Science, Sichuan University, China
Xiaotian Song
Xiaotian Song
Sichuan University
Andy Song
Andy Song
A/Prof of AI, School of Computing Technologies, CIAIRI, RMIT University
Artificial IntelligenceEvolutionary ComputationPattern RecognitionOptimizationComputer Vision
B
BaDong Chen
Institute of Artificial Intelligence and Robotics, Xi’an Jiaotong University, China
Jiancheng Lv
Jiancheng Lv
University of Science and Technology of China
Operations ManagementMarketing
Y
Yanan Sun
School of Computer Science, Sichuan University, China