๐ค AI Summary
Deploying large DeepSeek models on edge devices faces significant challenges due to high computational overhead and energy consumption. This work proposes DSPE, a dedicated inference processor for edge deployment, which achieves substantial improvements in energy efficiency and accuracy through three synergistic innovations: Merkle Treeโbased incremental pruning for secure sparse computation, multi-level Boothing lookup for fault-tolerant approximate multiplication, and a dynamic adaptive Posit number format with a tailored hardware multiplier architecture. Implemented in TSMC 28nm CMOS technology, DSPE attains an energy efficiency of 109.4 TFLOPS/W, markedly outperforming existing solutions and providing a scalable hardware foundation for efficient and secure edge deployment of large language models.
๐ Abstract
In recent years, DeepSeek has achieved strong inference performance but remains hard to deploy on energy-constrained edge devices. This paper presents the DeepSeek Processing Element (DSPE), an edge-oriented architecture that alleviates the model's heavy computational and energy demands. DSPE introduces three techniques: the MerkleTree-based Incremental Pruning Scheme (MIPS) for secure redundant-vector reduction, the Multi-Stage Boothing Lookup Method (MBLM) for bit-flip-aware approximate multiplication, and the Dynamic Adaptive Posit Processing Mechanism (DAPPM), which introduces a new DA-Posit format and its corresponding hardware multiplication architecture. Implemented in TSMC 28nm CMOS, DSPE achieves 109.4 TFLOPS/W energy efficiency compared with state-of-the-art designs and offers a scalable foundation for edge deployment.