🤖 AI Summary
To address the high hardware overhead and poor flexibility in implementing multiple activation functions (e.g., Swish, Softmax, SeLU, GeLU) on resource-constrained AI accelerators, this paper proposes DA-VINCI AF—a configurable activation function architecture based on the CORDIC algorithm. DA-VINCI AF introduces a novel CORDIC-driven dynamic reconfiguration mechanism, enabling runtime function switching and adaptive fixed-point precision selection. It unifies the hardware implementation of MAC operations, Sigmoid, Tanh, ReLU, and arbitrary target nonlinear functions. Furthermore, it integrates a cumulative NEURIC computing unit to enable unified acceleration across DNNs, RNNs, and Transformers. Implemented in ASIC, DA-VINCI AF achieves 98.5% quality-of-result (QoR), significantly improving energy efficiency and area efficiency. It has been adopted as the core component of high-efficiency vector engines in both academic and commercial AI processors.
📝 Abstract
A CORDIC-based configuration for the design of Activation Functions (AF) was previously suggested to accelerate ASIC hardware design for resource-constrained systems by providing functional reconfigurability. Since its introduction, this new approach for neural network acceleration has gained widespread popularity, influencing numerous designs for activation functions in both academic and commercial AI processors. In this retrospective analysis, we explore the foundational aspects of this initiative, summarize key developments over recent years, and introduce the DA-VINCI AF tailored for the evolving needs of AI applications. This new generation of dynamically configurable and precision-adjustable activation function cores promise greater adaptability for a range of activation functions in AI workloads, including Swish, SoftMax, SeLU, and GeLU, utilizing the Shift-and-Add CORDIC technique. The previously presented design has been optimized for MAC, Sigmoid, and Tanh functionalities and incorporated into ReLU AFs, culminating in an accumulative NEURIC compute unit. These enhancements position NEURIC as a fundamental component in the resource-efficient vector engine for the realization of AI accelerators that focus on DNNs, RNNs/LSTMs, and Transformers, achieving a quality of results (QoR) of 98.5%.