N-TORC: Native Tensor Optimizer for Real-time Constraints

📅 2025-04-07

📈 Citations: 0

✨ Influential: 0

career value

185K/year

🤖 AI Summary

Existing HLS-based ML accelerator compilers struggle to jointly optimize hardware area and model accuracy under real-time latency constraints, while suffering from long compilation times and unpredictable deployment performance. Method: We propose the first integrated optimization framework that combines data-driven performance/resource modeling with mixed-integer programming (MIP), enabling layer-wise customized synthesis for dataflow architectures. The framework tightly integrates HLS4ML and FINN and jointly optimizes model hyperparameters and hardware synthesis directives. Contribution/Results: It generates Pareto-optimal architectures trading off accuracy and hardware cost. Evaluated on DROPBEAR, our approach reduces modeling error significantly, accelerates MIP solving by up to 1000× over random search, and achieves comparable solution quality.

Technology Category

Application Category

📝 Abstract

Compared to overlay-based tensor architectures like VTA or Gemmini, compilers that directly translate machine learning models into a dataflow architecture as HLS code, such as HLS4ML and FINN, generally can achieve lower latency by generating customized matrix-vector multipliers and memory structures tailored to the specific fundamental tensor operations required by each layer. However, this approach has significant drawbacks: the compilation process is highly time-consuming and the resulting deployments have unpredictable area and latency, making it impractical to constrain the latency while simultaneously minimizing area. Currently, no existing methods address this type of optimization. In this paper, we present N-TORC (Native Tensor Optimizer for Real-Time Constraints), a novel approach that utilizes data-driven performance and resource models to optimize individual layers of a dataflow architecture. When combined with model hyperparameter optimization, N-TORC can quickly generate architectures that satisfy latency constraints while simultaneously optimizing for both accuracy and resource cost (i.e. offering a set of optimal trade-offs between cost and accuracy). To demonstrate its effectiveness, we applied this framework to a cyber-physical application, DROPBEAR (Dynamic Reproduction of Projectiles in Ballistic Environments for Advanced Research). N-TORC's HLS4ML performance and resource models achieve higher accuracy than prior efforts, and its Mixed Integer Program (MIP)-based solver generates equivalent solutions to a stochastic search in 1000X less time.

Problem

Research questions and friction points this paper is trying to address.

Optimize dataflow architecture for real-time latency constraints

Balance accuracy and resource cost in tensor operations

Reduce compilation time and improve deployment predictability

Innovation

Methods, ideas, or system contributions that make the work stand out.

Data-driven performance and resource models

Combines hyperparameter optimization with architecture generation

MIP-based solver for efficient solution generation

🔎 Similar Papers

No similar papers found.