A Data-Driven Dynamic Execution Orchestration Architecture

📅 2026-02-19

📈 Citations: 0

✨ Influential: 0

career value

233K/year

🤖 AI Summary

This work addresses the inefficiency of existing programmable architectures in handling sparse or irregular data and the inflexibility of dedicated accelerators when confronted with new kernels or input patterns. To bridge this gap, the paper proposes Canon, a novel architecture that integrates a programmable finite state machine (FSM) with a dynamic, data-driven execution orchestration mechanism to generate control flow at runtime. Canon further introduces a time-interleaved SIMD execution model that constructs an evolving dataflow to maximize parallelism. This design achieves performance and energy efficiency approaching that of specialized accelerators across a range of data-oblivious and data-driven kernels, while preserving the programmability and flexibility of general-purpose architectures.

Technology Category

Application Category

📝 Abstract

Domain-specific accelerators deliver exceptional performance on their target workloads through fabrication-time orchestrated datapaths. However, such specialized architectures often exhibit performance fragility when exposed to new kernels or irregular input patterns. In contrast, programmable architectures like FPGAs, CGRAs, and GPUs rely on compile-time orchestration to support a broader range of applications; but they are typically less efficient under irregular or sparse data. Pushing the boundaries of programmable architectures requires designs that can achieve efficiency and high-performance on par with specialized accelerators while retaining the agility of general-purpose architectures. We introduce Canon, a parallel architecture that bridges the gap between specialized and general purpose architectures. Canon exploits data-level and instruction-level parallelism through its novel design. First, it employs a novel dynamic data-driven orchestration mechanism using programmable Finite State Machines (FSMs). These FSMs are programmed at compile time to encode high-level dataflow per state and translate incoming meta-information (e.g., sparse coordinates) into control instructions at runtime. Second, Canon introduces a time-lapsed SIMD execution in which instructions are issued across a row of processing elements over several cycles, creating a staggered pipelined execution. These innovations amortize control overhead, allowing dynamic instruction changes while constructing a continuously evolving dataflow that maximizes parallelism. Experimental evaluation shows that Canon delivers high performance across diverse data-agnostic and data-driven kernels while achieving efficiency comparable to specialized accelerators, yet retaining the flexibility of a general-purpose architecture.

Problem

Research questions and friction points this paper is trying to address.

programmable architectures

performance fragility

irregular data

specialized accelerators

execution orchestration

Innovation

Methods, ideas, or system contributions that make the work stand out.

data-driven orchestration

dynamic execution

time-lapsed SIMD