🤖 AI Summary
To address the high interconnect overhead and low core density of domain-specific instruction-set processors (DSIPs) for machine learning under Ångström-scale fabrication nodes, this work proposes a physically efficient DSIP architecture that tightly integrates customized near-memory storage structures with compact SIMD compute units, substantially reducing routing complexity. Comprehensive evaluation across five configurations—using the IMEC A10 nanosheet PDK—demonstrates, with minimal manual floorplanning, over 2× reduction in normalized wirelength and more than 3× improvement in core density, while maintaining high cross-configuration robustness. Compared to the state-of-the-art VWR2A baseline, the proposed design exhibits superior scalability and routability. It thus provides a cost-effective, high-density, low-interconnect-overhead implementation pathway for Ångström-era DSIPs.
📝 Abstract
This paper presents the physical design exploration of a domain-specific processor (DSIP) architecture targeted at machine learning (ML), addressing the challenges of interconnect efficiency in advanced Angstrom-era technologies. The design emphasizes reduced wire length and high core density by utilizing specialized memory structures and SIMD (Single Instruction, Multiple Data) units. Five configurations are synthesized and evaluated using the IMEC A10 nanosheet node PDK. Key physical design metrics are compared across configurations and against VWR2A, a state-of-the-art (SoA) DSIP baseline. Results show that our architecture achieves over 2x lower normalized wire length and more than 3x higher density than the SoA, with low variability in the metrics across all configurations, making it a promising solution for next-generation DSIP designs. These improvements are achieved with minimal manual layout intervention, demonstrating the architecture's intrinsic physical efficiency and potential for low-cost wire-friendly implementation.