🤖 AI Summary
In high-level synthesis (HLS), multi-loop nests with uncertain memory dependencies are often conservatively serialized, severely limiting parallelism. Method: This paper proposes a dynamic loop fusion technique integrating compiler-hardware co-design, runtime address monotonicity verification, and polyhedral scheduling heuristics. By jointly performing program-order scheduling and address monotonicity analysis, it dynamically resolves unpredictable memory dependences—eliminating the need for static disambiguation via address history search or loop serialization constraints. Contribution/Results: Our approach enables, for the first time, safe parallel execution of sibling loops with complex memory dependencies in dynamic HLS. Experimental evaluation shows an average 14× speedup over conventional static HLS and a 4× improvement over state-of-the-art dynamic HLS, significantly enhancing memory optimization capability and throughput efficiency for irregular code.
📝 Abstract
Dynamic High-Level Synthesis (HLS) uses additional hardware to perform memory disambiguation at runtime, increasing loop throughput in irregular codes compared to static HLS. However, most irregular codes consist of multiple sibling loops, which currently have to be executed sequentially by all HLS tools. Static HLS performs loop fusion only on regular codes, while dynamic HLS relies on loops with dependencies to run to completion before the next loop starts. We present dynamic loop fusion for HLS, a compiler/hardware co-design approach that enables multiple loops to run in parallel, even if they contain unpredictable memory dependencies. Our only requirement is that memory addresses are monotonically non-decreasing in inner loops. We present a novel program-order schedule for HLS, inspired by polyhedral compilers, that together with our address monotonicity analysis enables dynamic memory disambiguation that does not require searching of address histories and sequential loop execution. Our evaluation shows an average speedup of 14$ imes$ over static and 4$ imes$ over dynamic HLS.