🤖 AI Summary
This work addresses the absence of standardized instruction set architectures (ISAs) and compiler backends for tensor accelerators, which currently rely on expert-written, hard-to-evaluate implementations. The paper proposes the first fully automated abstraction methodology that elevates hardware descriptions from RTL to a tensor-level ISA. Leveraging an eight-stage MLIR-based semantic lifting pipeline, the approach automatically recovers high-level features—including MAC patterns, saturation semantics, multi-dimensional buffer structures, and data layouts—without manual intervention. The generated ISA specifications enable automatic compiler backend generation, achieving a 92.9% bit-level MLIR reduction on Gemmini and successfully generalizing to VTA. The synthesized backends match hand-tuned performance and are formally verified for functional correctness using Z3 SMT, while also uncovering hardware characteristics overlooked in manual designs.
📝 Abstract
Numerous tensor accelerator designs have been proposed, yet most lack well-documented ISAs and compiler backends, limiting evaluation to a handful of operators. Recent work has shown that given a tensor-level ISA specification, complete software stacks including compiler backends can be automatically generated--but writing such specifications remains a manual, expert-driven process.
We present ATLAAS, the first end-to-end MLIR-based pipeline that lifts RTL-extracted accelerator semantics to tensor ISA specifications. Starting from bit-level LLVM IR produced by prior architecture-level model extraction, ATLAAS applies an 8-pass semantic lifting pipeline that progressively recovers high-level tensor structure--MAC idioms, saturation semantics, multi-dimensional buffer organizations, and data layout transformations--emitting specifications that immediately enable automatic software stack generation through the ACT ecosystem.
We evaluate ATLAAS on the Gemmini systolic-array accelerator, where the pipeline collapses bit-level MLIR by up to 92.9% on processing elements and 24-34% on controller modules. ATLAAS discovers hardware features omitted from the hand-written reference, with correctness validated via Z3 SMT equivalence proofs. Generality is confirmed on TVM's VTA processor, where the same pipeline lifts all four datapath modules without accelerator-specific changes, enabling an automated path from RTL to a performance-competitive compiler backend.