🤖 AI Summary
This work addresses the excessive overhead of conventional control mechanisms when executing multidimensional loop kernels on tightly coupled processor arrays, which severely limits performance. By leveraging the polyhedral model to represent the iteration space, the paper introduces a novel approach that expresses control conditions as unions of polyhedra, significantly simplifying control logic. Building on this formulation, the authors propose a lightweight global controller requiring hardware resources equivalent to only a single processing element, enabling zero-overhead distribution of loop control signals. Combined with bounded evaluation units and optimized control signal latency, the design reduces the number of control signals by 15–45× on the PolyBench benchmark suite, while keeping overall control-flow resource usage below 10% of the total array resources.
📝 Abstract
Multidimensional loop kernels often suffer from control overhead that can dominate execution time on parallel loop accelerators. Tightly Coupled Processor Arrays (TCPAs) offload loop control to a global controller (GC), but existing approaches still require hundreds of control signals. We propose a method to derive and aggressively reduce these control conditions from a polyhedral representation of the iteration space, achieving reductions of 15x to 45x in control signals across several benchmarks. We introduce a lightweight GC architecture that evaluates conditions as unions of polyhedra using bounded evaluation units, requiring hardware comparable to a single processing element. Control signals are distributed throughout the array with a minimal number of delay elements resulting in zero-overhead loop control. Our evaluation on PolyBench kernels shows that the entire control flow requires < 10 % of the total array resources.