๐ค AI Summary
This work proposes the first unified hardware acceleration framework tailored for finite element computation, spiking neural networks, and sparse tensor operationsโthree representative scientific computing workloads that existing accelerators struggle to support efficiently due to fixed precision, bit-width inflation, and the need for manual sparsity pattern configuration. Built on a reconfigurable FPGA architecture, the framework introduces a memory-guided mixed-precision strategy, an empirical-driven dynamic bit-width management scheme, and an adaptive parallelism mechanism. It further integrates curriculum learning to automatically discover sparsity patterns, thereby eliminating inter-unit data transfer overhead. Experimental results demonstrate that the proposed approach improves numerical accuracy by 2.8%, throughput by 47%, and energy efficiency by 34% on average across multiple benchmarks, while achieving 45โ65% higher throughput compared to specialized accelerators.
๐ Abstract
Recent hardware acceleration advances have enabled powerful specialized accelerators for finite element computations, spiking neural network inference, and sparse tensor operations. However, existing approaches face fundamental limitations: (1) finite element methods lack comprehensive rounding error analysis for reduced-precision implementations and use fixed precision assignment strategies that cannot adapt to varying numerical conditioning; (2) spiking neural network accelerators cannot handle non-spike operations and suffer from bit-width escalation as network depth increases; and (3) FPGA tensor accelerators optimize only for dense computations while requiring manual configuration for each sparsity pattern. To address these challenges, we introduce \textbf{Memory-Guided Unified Hardware Accelerator for Mixed-Precision Scientific Computing}, a novel framework that integrates three enhanced modules with memory-guided adaptation for efficient mixed-workload processing on unified platforms. Our approach employs memory-guided precision selection to overcome fixed precision limitations, integrates experience-driven bit-width management and dynamic parallelism adaptation for enhanced spiking neural network acceleration, and introduces curriculum learning for automatic sparsity pattern discovery. Extensive experiments on FEniCS, COMSOL, ANSYS benchmarks, MNIST, CIFAR-10, CIFAR-100, DVS-Gesture datasets, and COCO 2017 demonstrate 2.8\% improvement in numerical accuracy, 47\% throughput increase, 34\% energy reduction, and 45-65\% throughput improvement compared to specialized accelerators. Our work enables unified processing of finite element methods, spiking neural networks, and sparse computations on a single platform while eliminating data transfer overhead between separate units.