MetaTrinity: Enabling Fast Metagenomic Classification via Seed Counting and Edit Distance Approximation

📅 2023-11-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Metagenomic classification tools commonly face a fundamental trade-off between accuracy and computational efficiency. To address this, we propose a lightweight heuristic framework that innovatively integrates seed-hash indexing, k-mer spectral statistics, fast approximate edit-distance computation, and a multi-stage filtering strategy—achieving high accuracy without sacrificing speed. Experimental evaluation demonstrates that our method matches the accuracy of the high-precision tool Metalign (Δ = 0% accuracy loss) while accelerating classification by 4×. Against the fast classifier Kraken2, it improves classification accuracy by 17× and achieves a 3.4× gain in the accuracy–speed Pareto-optimal trade-off metric. By reconciling precision and throughput, our approach establishes a new paradigm for high-accuracy, real-time metagenomic classification of large-scale datasets.
📝 Abstract
Metagenomics, the study of genome sequences of diverse organisms cohabiting in a shared environment, has experienced significant advancements across various medical and biological fields. Metagenomic analysis is crucial, for instance, in clinical applications such as infectious disease screening and the diagnosis and early detection of diseases such as cancer. A key task in metagenomics is to determine the species present in a sample and their relative abundances. Currently, the field is dominated by either alignment-based tools, which offer high accuracy but are computationally expensive, or alignment-free tools, which are fast but lack the needed accuracy for many applications. In response to this dichotomy, we introduce MetaTrinity, a tool based on heuristics, to achieve a fundamental improvement in accuracy-runtime tradeoff over existing methods. We benchmark MetaTrinity against two leading metagenomic classifiers, each representing different ends of the performance-accuracy spectrum. On one end, Kraken2, a tool optimized for performance, shows modest accuracy yet a rapid runtime. The other end of the spectrum is governed by Metalign, a tool optimized for accuracy. Our evaluations show that MetaTrinity achieves an accuracy comparable to Metalign while gaining a 4x speedup without any loss in accuracy. This directly equates to a fourfold improvement in runtime-accuracy tradeoff. Compared to Kraken2, MetaTrinity requires a 5x longer runtime yet delivers a 17x improvement in accuracy. This demonstrates a 3.4x enhancement in the accuracy-runtime tradeoff for MetaTrinity. This dual comparison positions MetaTrinity as a broadly applicable solution for metagenomic classification, combining advantages of both ends of the spectrum: speed and accuracy. MetaTrinity is publicly available at https://github.com/CMU-SAFARI/MetaTrinity.
Problem

Research questions and friction points this paper is trying to address.

Metagenomic classification
Accuracy
Processing speed
Innovation

Methods, ideas, or system contributions that make the work stand out.

MetaTrinity
Metagenomic Classification
Accuracy and Efficiency Balance
🔎 Similar Papers
No similar papers found.
Arvid E. Gollwitzer
Arvid E. Gollwitzer
MIT | ETH Zurich | Broad Institute of MIT and Harvard | CERN
Computational GenomicsClinical MetagenomicsCancer DetectionTargeted Drug Delivery
Mohammed Alser
Mohammed Alser
TT Assistant Professor, GSU, ALSER Lab
BioinformaticsMetagenomicsComputational GenomicsComputer Architecture
J
Joel Bergtholdt
Department of Information Technology and Electrical Engineering, ETH Zürich, 8092 Zürich, Switzerland
J
Joël Lindegger
Department of Information Technology and Electrical Engineering, ETH Zürich, 8092 Zürich, Switzerland
M
Maximilian-David Rumpf
Department of Computer Science, ETH Zürich, 8092 Zürich, Switzerland
Can Firtina
Can Firtina
Assistant Professor of Computer Science, UMD
BioinformaticsComputer ArchitectureHardware-Software Co-design
Serghei Mangul
Serghei Mangul
USC
GenomicsBioinformatics
O
Onur Mutlu
Department of Computer Science, ETH Zürich, 8092 Zürich, Switzerland; Department of Information Technology and Electrical Engineering, ETH Zürich, 8092 Zürich, Switzerland