NeuroScalar: A Deep Learning Framework for Fast, Accurate, and In-the-Wild Cycle-Level Performance Prediction

📅 2025-09-26

📈 Citations: 0

✨ Influential: 0

career value

256K/year

🤖 AI Summary

Current microprocessor evaluation relies heavily on inefficient, non-representative cycle-accurate simulators. This paper introduces Neutrino, a deep learning–based high-fidelity “in-the-wild” simulation framework enabling cycle-level performance prediction of hypothetical microarchitectures on commodity hardware. Methodologically, Neutrino employs microarchitecture-agnostic feature modeling to ensure cross-generational design transferability; implements a lightweight hardware trace collection and systematic sampling strategy—enabling low-overhead A/B testing in production environments (just 0.1% performance overhead) and scalable deployment; and co-designs an on-chip accelerator that achieves 5 MIPS baseline simulation throughput on GPU, with the accelerator delivering an additional 85× speedup. Collectively, these contributions significantly accelerate processor design iteration and enable efficient, scalable hardware evaluation under realistic workloads.

Technology Category

Application Category

📝 Abstract

The evaluation of new microprocessor designs is constrained by slow, cycle-accurate simulators that rely on unrepresentative benchmark traces. This paper introduces a novel deep learning framework for high-fidelity, ``in-the-wild'' simulation on production hardware. Our core contribution is a DL model trained on microarchitecture-independent features to predict cycle-level performance for hypothetical processor designs. This unique approach allows the model to be deployed on existing silicon to evaluate future hardware. We propose a complete system featuring a lightweight hardware trace collector and a principled sampling strategy to minimize user impact. This system achieves a simulation speed of 5 MIPS on a commodity GPU, imposing a mere 0.1% performance overhead. Furthermore, our co-designed Neutrino on-chip accelerator improves performance by 85x over the GPU. We demonstrate that this framework enables accurate performance analysis and large-scale hardware A/B testing on a massive scale using real-world applications.

Problem

Research questions and friction points this paper is trying to address.

Predicts cycle-level performance for new processor designs

Enables fast hardware evaluation using existing production systems

Overcomes limitations of slow traditional simulation methods

Innovation

Methods, ideas, or system contributions that make the work stand out.

Deep learning model with microarchitecture-independent features

Lightweight hardware trace collector and sampling strategy

On-chip accelerator for 85x performance improvement

🔎 Similar Papers

Optimizing Cycle Life Prediction of Lithium-ion Batteries via a Physics-Informed Model