🤖 AI Summary
This work addresses the limitation of conventional hardware compilation flows, wherein pipeline optimization is deferred to the backend, resulting in the loss of high-level structural information and suboptimal global optimization. To overcome this, the paper introduces, for the first time, an explicit pipeline-aware compiler pass at the intermediate representation (IR) level. By modeling legality constraints for register relocation, integrating learning-driven timing prediction, and formulating the timing-constrained relocation problem as a global minimum-cost flow problem, the approach enables timing-aware RTL generation. Implemented within the CIRCT framework, the method significantly reduces critical-path delay, power, and area across both open-source and commercial designs, while also providing backend retiming with a superior initial structure.
📝 Abstract
Modern hardware compilers increasingly rely on rich intermediate representations (IRs) to preserve optimization-relevant semantics before generating RTL code. However, one important optimization is still largely deferred to backend tools: pipeline optimization. In common RTL flows, registers are inserted by frontend heuristics or hardware designers and later adjusted by backend retiming after the design has been lowered to a much lower-level netlist representation. At that point, much of the operator-level structure originally exposed by the compiler IR has already been weakened or lost, limiting opportunities for global, compiler-level pipeline optimization.
This paper presents PipeRTL, an IR-level pipeline optimization framework for hardware compilers, instantiated in CIRCT. PipeRTL makes the legality of register relocation explicit in the IR, uses a learned timing predictor to approximate downstream delay behavior, and formulates timing-aware register relocation as a global min-cost flow problem under timing constraints. Evaluation on open-source designs under a commercial backend synthesis flow shows that PipeRTL improves downstream implementation quality on average, reducing critical-path delay, power, and area across the evaluated benchmarks, while also providing a stronger starting point for backend retiming. These results indicate that exposing pipeline optimization as an explicit compiler pass can deliver backend-meaningful gains by improving the sequential structure presented to later stages and the resulting downstream implementation quality.