A Direct Memory Access Controller (DMAC) for Irregular Data Transfers on RISC-V Linux Systems

📅 2025-10-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address high startup latency and low bus utilization of conventional DMACs in small-scale, irregular data transfers, this paper proposes a lightweight descriptor-based DMA controller tailored for RISC-V Linux systems. Our design introduces a streamlined descriptor format and a penalty-free speculative prefetching mechanism, enabling a high-frequency, area-efficient, AXI4-compliant implementation on FPGA and advanced process nodes. Compared to commercial IP cores, it reduces transfer startup latency by 1.66× and improves bus utilization by up to 2.5× (ideal memory) or 3.6× (deep memory). Hardware overhead is reduced by 11% LUTs and 23% FFs, with no BRAM required. The synthesized design achieves 1.44 GHz operation at 49.5 kGE, delivering the first high-performance optimization for small-data movement in open-source RISC-V SoCs.

Technology Category

Application Category

📝 Abstract
With the ever-growing heterogeneity in computing systems, driven by modern machine learning applications, pressure is increasing on memory systems to handle arbitrary and more demanding transfers efficiently. Descriptor-based direct memory access controllers (DMACs) allow such transfers to be executed by decoupling memory transfers from processing units. Classical descriptor-based DMACs are inefficient when handling arbitrary transfers of small unit sizes. Excessive descriptor size and the serialized nature of processing descriptors employed by the DMAC lead to large static overheads when setting up transfers. To tackle this inefficiency, we propose a descriptor-based DMAC optimized to efficiently handle arbitrary transfers of small unit sizes. We implement a lightweight descriptor format in an AXI4-based DMAC. We further increase performance by implementing a low-overhead speculative descriptor prefetching scheme without additional latency penalties in the case of a misprediction. Our DMAC is integrated into a 64-bit Linux-capable RISC-V SoC and emulated on a Kintex FPGA to evaluate its performance. Compared to an off-the-shelf descriptor-based DMAC IP, we achieve 1.66x less latency launching transfers, increase bus utilization up to 2.5x in an ideal memory system with 64-byte-length transfers while requiring 11% fewer lookup tables, 23% fewer flip-flops, and no block RAMs. We can extend our lead in bus utilization to 3.6x with 64-byte-length transfers in deep memory systems. We synthesized our DMAC in GlobalFoundries' GF12LP+ node, achieving a clock frequency of over 1.44 GHz while occupying only 49.5 kGE.
Problem

Research questions and friction points this paper is trying to address.

Optimizing DMAC for small irregular data transfers
Reducing descriptor overhead in memory access controllers
Improving bus utilization with speculative prefetching scheme
Innovation

Methods, ideas, or system contributions that make the work stand out.

Lightweight descriptor format for small data transfers
Speculative descriptor prefetching with zero misprediction penalty
AXI4-based DMA controller integrated into RISC-V SoC
🔎 Similar Papers
No similar papers found.