🤖 AI Summary
This work addresses the limitations of existing lossless genomic compression methods—such as shallow modeling, poor adaptability, and unfriendly user interaction—by proposing a novel three-tier multi-agent evolutionary compression framework powered by large language models (LLMs). The user layer enables natural language interaction, the cognitive layer jointly optimizes algorithmic, data, and system configurations, and the compression layer automates multi-knowledge learning to perform compression and decompression, offering three operational modes: compression-ratio priority, throughput priority, and balanced. To our knowledge, this is the first integration of LLMs and multi-agent evolutionary learning into genomic compression, yielding an evolvable, adaptive, and user-friendly system. Evaluated across nine datasets against 14 baseline methods, the framework achieves an average compression ratio improvement of 16.37% and up to a 9.23× increase in throughput.
📝 Abstract
Lossless compression has made significant advancements in Genomics Data (GD) storage, sharing and management. Current learning-based methods are non-evolvable with problems of low-level compression modeling, limited adaptability, and user-unfriendly interface. To this end, we propose AgentGC, the first evolutionary Agent-based GD Compressor, consisting of 3 layers with multi-agent named Leader and Worker. Specifically, the 1) User layer provides a user-friendly interface via Leader combined with LLM; 2) Cognitive layer, driven by the Leader, integrates LLM to consider joint optimization of algorithm-dataset-system, addressing the issues of low-level modeling and limited adaptability; and 3) Compression layer, headed by Worker, performs compression&decompression via a automated multi-knowledge learning-based compression framework. On top of AgentGC, we design 3 modes to support diverse scenarios: CP for compression-ratio priority, TP for throughput priority, and BM for balanced mode. Compared with 14 baselines on 9 datasets, the average compression ratios gains are 16.66%, 16.11%, and 16.33%, the throughput gains are 4.73x, 9.23x, and 9.15x, respectively.