N-TORC: Native Tensor Optimizer for Real-time Constraints

📅 2025-04-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing HLS-based ML accelerator compilers struggle to jointly optimize hardware area and model accuracy under real-time latency constraints, while suffering from long compilation times and unpredictable deployment performance. Method: We propose the first integrated optimization framework that combines data-driven performance/resource modeling with mixed-integer programming (MIP), enabling layer-wise customized synthesis for dataflow architectures. The framework tightly integrates HLS4ML and FINN and jointly optimizes model hyperparameters and hardware synthesis directives. Contribution/Results: It generates Pareto-optimal architectures trading off accuracy and hardware cost. Evaluated on DROPBEAR, our approach reduces modeling error significantly, accelerates MIP solving by up to 1000× over random search, and achieves comparable solution quality.

Technology Category

Application Category

📝 Abstract
Compared to overlay-based tensor architectures like VTA or Gemmini, compilers that directly translate machine learning models into a dataflow architecture as HLS code, such as HLS4ML and FINN, generally can achieve lower latency by generating customized matrix-vector multipliers and memory structures tailored to the specific fundamental tensor operations required by each layer. However, this approach has significant drawbacks: the compilation process is highly time-consuming and the resulting deployments have unpredictable area and latency, making it impractical to constrain the latency while simultaneously minimizing area. Currently, no existing methods address this type of optimization. In this paper, we present N-TORC (Native Tensor Optimizer for Real-Time Constraints), a novel approach that utilizes data-driven performance and resource models to optimize individual layers of a dataflow architecture. When combined with model hyperparameter optimization, N-TORC can quickly generate architectures that satisfy latency constraints while simultaneously optimizing for both accuracy and resource cost (i.e. offering a set of optimal trade-offs between cost and accuracy). To demonstrate its effectiveness, we applied this framework to a cyber-physical application, DROPBEAR (Dynamic Reproduction of Projectiles in Ballistic Environments for Advanced Research). N-TORC's HLS4ML performance and resource models achieve higher accuracy than prior efforts, and its Mixed Integer Program (MIP)-based solver generates equivalent solutions to a stochastic search in 1000X less time.
Problem

Research questions and friction points this paper is trying to address.

Optimize dataflow architecture for real-time latency constraints
Balance accuracy and resource cost in tensor operations
Reduce compilation time and improve deployment predictability
Innovation

Methods, ideas, or system contributions that make the work stand out.

Data-driven performance and resource models
Combines hyperparameter optimization with architecture generation
MIP-based solver for efficient solution generation
🔎 Similar Papers
No similar papers found.
S
Suyash Vardhan Singh
University of South Carolina, Columbia, South Carolina, USA
I
Iftakhar Ahmad
University of South Carolina, Columbia, South Carolina, USA
D
David Andrews
University of Arkansas, Fayetteville, Arkansas, USA
M
Miaoqing Huang
University of Arkansas, Fayetteville, Arkansas, USA
A
Austin R. J. Downey
University of South Carolina, Columbia, South Carolina, USA
Jason D. Bakos
Jason D. Bakos
Professor of Computer Science and Engineering, University of South Carolina
computer architecturereconfigurable computingheterogeneous computinghigh performance computingembedded systems