🤖 AI Summary
Facing diminishing returns from Moore’s Law, multipliers and multiply-accumulate (MAC) units struggle to sustain improvements in performance and area efficiency. This paper proposes a process-aware differentiable architecture optimization framework. Its core innovation lies in modeling multi-level parallel Wallace/DA-type compression trees as neural-network-like structures, thereby reformulating discrete circuit optimization as a differentiable continuous optimization problem. By integrating process-dependent, differentiable timing and area models, the framework enables end-to-end automatic optimization using mainstream deep learning toolkits. Evaluated across multiple CMOS technology nodes, the method achieves, on average, 18% higher throughput and 23% smaller area compared to state-of-the-art open-source and commercial IP cores. These gains significantly enhance hardware energy efficiency for compute-intensive applications—particularly AI accelerators—without requiring manual design iteration or technology-specific heuristics.
📝 Abstract
Multipliers and multiply-accumulators (MACs) are fundamental building blocks for compute-intensive applications such as artificial intelligence. With the diminishing returns of Moore's Law, optimizing multiplier performance now necessitates process-aware architectural innovations rather than relying solely on technology scaling. In this paper, we introduce DOMAC, a novel approach that employs differentiable optimization for designing multipliers and MACs at specific technology nodes. DOMAC establishes an analogy between optimizing multi-staged parallel compressor trees and training deep neural networks. Building on this insight, DOMAC reformulates the discrete optimization challenge into a continuous problem by incorporating differentiable timing and area objectives. This formulation enables us to utilize existing deep learning toolkit for highly efficient implementation of the differentiable solver. Experimental results demonstrate that DOMAC achieves significant enhancements in both performance and area efficiency compared to state-of-the-art baselines and commercial IPs in multiplier and MAC designs.