BRYT: Data Rich Analytics Based Computer Architecture for A New Paradigm of Chip Design to Supplant Moore's Law

📅 2023-12-20

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Addressing the on-chip analysis bottleneck in the post-Moore era, this paper introduces a data-enriched architectural paradigm to resolve three core challenges in hardware introspection: difficulty of in-situ A/B testing, ambiguity between hardware and software behavior, and lack of fine-grained observability. Methodologically, we propose the first embedded lightweight analysis unit (YPU), integrating RTL-level telemetry circuits, a streaming analytics engine, and workload-aware compression/triggering mechanisms—achieving <1% area overhead and <25 mW power consumption in 7 nm technology. Our key contributions include the first zero-overhead, instruction-level cycle-stack tracing; in-field prefetcher evaluation; module-level cycle-utilization heatmaps; and real-time AI tensor-value distribution histograms. Evaluated across four representative case studies under realistic workloads, the YPU demonstrates both efficacy and practical utility for production-grade chip analysis.

📝 Abstract

Motivated by the end of Moore's Law and Dennard Scaling which necessitate architectural efficiency as the means for improved capability for the next decade or two, this paper introduces a new data-rich paradigm of chip design for the semi-conductor industry. The goal is to enable monitoring chip hardware behavior in the field, at real-time speeds with no slowdowns, with minimal power overheads and obtain insights on chip behavior and workloads. We posit that, such extensive amounts of data would allow better and more capable architectures addressing three problems: obfuscated hardware, obfuscated software, and inability of A/B testing for hardware ideas. This paper implements the first version of the paradigm with a system architecture and the concept of an analYtics Processing Unit (YPU). We perform 4 case studies, and implement an RTL level prototype. Across the case studies we show a YPU with area overhead $<1 %$ at 7nm, and overall power consumption of $<25 mW$ is able to create previously inconceivable analysis: per-instruction cycles stacks of arbitrary programs, evaluating instruction prefetchers in the wild before deployment, fine-grained cycle-by-cycle utilization of hardware modules, and histograms of tensor-value distributions of DL models.

Problem

Research questions and friction points this paper is trying to address.

Monitoring chip hardware behavior during real workloads

Enabling hardware introspection without performance slowdowns

Solving A/B testing and obfuscation challenges in chips

Innovation

Methods, ideas, or system contributions that make the work stand out.

IPU enables real-time hardware introspection without slowdowns

IPU achieves less than 1% area overhead at 7nm

IPU consumes under 25mW power for hardware analysis

🔎 Similar Papers

No similar papers found.

Authors to Follow