SAGe: A Lightweight Algorithm-Architecture Co-Design for Alleviating Data Preparation Overheads in Large-Scale Genome Analysis

📅 2025-03-31

📈 Citations: 0

✨ Influential: 0

career value

253K/year

🤖 AI Summary

In large-scale genomic analysis, decompression and reconstruction of compressed data incur substantial performance and energy overheads, severely limiting accelerator efficiency. To address this, we propose a software-hardware co-designed on-demand decompression framework: (1) a lightweight, genome-aware compression/decompression algorithm; (2) a customized hardware decoder; (3) a storage layout optimized for sequential scanning; and (4) dedicated interface instructions. Leveraging algorithm-architecture co-design and genomic-specific modeling, our approach enables low-latency, high-energy-efficiency real-time data access even under high compression ratios. Experimental evaluation demonstrates that, compared to state-of-the-art solutions, our framework achieves 3.0–12.3× higher genomic accelerator performance and 18.8–49.6× better energy efficiency, while attaining compression ratios competitive with the SOTA and supporting lightweight integration into existing accelerator pipelines.

Technology Category

Application Category

📝 Abstract

There have been extensive efforts to accelerate genome analysis, given the exponentially growing volumes of genomic data. Prior works typically assume that the data is ready to be analyzed in the desired format; in real usage scenarios, however, it is common practice to store genomic data in storage systems in a compressed format. Unfortunately, preparing genomic data (i.e., accessing compressed data from storage, and decompressing and reformatting it) for an accelerator leads to large performance and energy overheads, significantly diminishing the accelerator's intended benefits. To harness the benefits of acceleration, without needing to store massive genomic data uncompressed, there is a critical need to effectively address data preparation overheads. The solution must meet three criteria: (i) high performance and energy efficiency, (ii) high compression ratios, comparable to state-of-the-art genomic compression, and (iii) be lightweight for seamless integration with a broad range of genomics systems. This is challenging, particularly due to the high decompression complexity of state-of-the-art genomic compressors and the resource constraints of a wide range of genomics systems. We propose SAGe, an algorithm-architecture co-design for highly-compressed storage and high-performance access of large-scale genomic data in desired formats. With our rigorous analysis of genomic datasets' features, we propose a co-design of a new (de)compression algorithm, hardware, storage data layout, and interface commands. SAGe encodes data in structures decodable by efficient sequential scans and lightweight hardware. To still maintain high compression ratios, SAGe exploits unique features of genomic data. SAGe improves the average performance (energy efficiency) of state-of-the-art genomics accelerators by 3.0-12.3x (18.8-49.6x), compared to when the accelerators rely on state-of-the-art decompressors.

Problem

Research questions and friction points this paper is trying to address.

Reduces data preparation overheads in genome analysis

Ensures high performance and energy efficiency

Maintains high compression ratios for genomic data

Innovation

Methods, ideas, or system contributions that make the work stand out.

Algorithm-architecture co-design for genomic data

Lightweight hardware-efficient (de)compression algorithm

High compression ratios with efficient sequential scans

🔎 Similar Papers

SequenceLab: A Comprehensive Benchmark of Computational Methods for Comparing Genomic Sequences