NeuroScalar: A Deep Learning Framework for Fast, Accurate, and In-the-Wild Cycle-Level Performance Prediction

📅 2025-09-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current microprocessor evaluation relies heavily on inefficient, non-representative cycle-accurate simulators. This paper introduces Neutrino, a deep learning–based high-fidelity “in-the-wild” simulation framework enabling cycle-level performance prediction of hypothetical microarchitectures on commodity hardware. Methodologically, Neutrino employs microarchitecture-agnostic feature modeling to ensure cross-generational design transferability; implements a lightweight hardware trace collection and systematic sampling strategy—enabling low-overhead A/B testing in production environments (just 0.1% performance overhead) and scalable deployment; and co-designs an on-chip accelerator that achieves 5 MIPS baseline simulation throughput on GPU, with the accelerator delivering an additional 85× speedup. Collectively, these contributions significantly accelerate processor design iteration and enable efficient, scalable hardware evaluation under realistic workloads.

Technology Category

Application Category

📝 Abstract
The evaluation of new microprocessor designs is constrained by slow, cycle-accurate simulators that rely on unrepresentative benchmark traces. This paper introduces a novel deep learning framework for high-fidelity, ``in-the-wild'' simulation on production hardware. Our core contribution is a DL model trained on microarchitecture-independent features to predict cycle-level performance for hypothetical processor designs. This unique approach allows the model to be deployed on existing silicon to evaluate future hardware. We propose a complete system featuring a lightweight hardware trace collector and a principled sampling strategy to minimize user impact. This system achieves a simulation speed of 5 MIPS on a commodity GPU, imposing a mere 0.1% performance overhead. Furthermore, our co-designed Neutrino on-chip accelerator improves performance by 85x over the GPU. We demonstrate that this framework enables accurate performance analysis and large-scale hardware A/B testing on a massive scale using real-world applications.
Problem

Research questions and friction points this paper is trying to address.

Predicts cycle-level performance for new processor designs
Enables fast hardware evaluation using existing production systems
Overcomes limitations of slow traditional simulation methods
Innovation

Methods, ideas, or system contributions that make the work stand out.

Deep learning model with microarchitecture-independent features
Lightweight hardware trace collector and sampling strategy
On-chip accelerator for 85x performance improvement
🔎 Similar Papers
No similar papers found.
S
Shayne Wadle
University of Wisconsin - Madison
Y
Yanxin Zhang
University of Wisconsin - Madison
V
Vikas Singh
University of Wisconsin - Madison
Karthikeyan Sankaralingam
Karthikeyan Sankaralingam
Professor of Computer Science, University of Wisconsin-Madison
Computer Architecture