PISCO: Pretty Simple Compression for Retrieval-Augmented Generation

📅 2025-01-27

📈 Citations: 0

✨ Influential: 0

career value

166K/year

🤖 AI Summary

To address the high inference overhead and context-length limitations in Retrieval-Augmented Generation (RAG) systems caused by long documents, this paper proposes a lightweight, pretraining-free, and annotation-free document compression method. We introduce a novel question-answer (QA)-driven, sequence-level knowledge distillation framework that enables end-to-end, unsupervised soft compression of documents. Our approach achieves a 16× compression ratio with only 0–3% accuracy degradation and supports full fine-tuning of 7–10B LLMs on a single GPU within 48 hours. On multi-source RAG QA benchmarks, it improves answer accuracy by 8% over existing compression baselines, substantially reducing computational cost and enhancing system scalability. The core contribution lies in the first use of QA signals—rather than token-level or sentence-level proxies—as direct, sequence-level supervision for knowledge distillation, thereby avoiding information loss and eliminating reliance on labeled data or task-specific training pipelines.

Technology Category

Application Category

📝 Abstract

Retrieval-Augmented Generation (RAG) pipelines enhance Large Language Models (LLMs) by retrieving relevant documents, but they face scalability issues due to high inference costs and limited context size. Document compression is a practical solution, but current soft compression methods suffer from accuracy losses and require extensive pretraining. In this paper, we introduce PISCO, a novel method that achieves a 16x compression rate with minimal accuracy loss (0-3%) across diverse RAG-based question-answering (QA) tasks. Unlike existing approaches, PISCO requires no pretraining or annotated data, relying solely on sequence-level knowledge distillation from document-based questions. With the ability to fine-tune a 7-10B LLM in 48 hours on a single A100 GPU, PISCO offers a highly efficient and scalable solution. We present comprehensive experiments showing that PISCO outperforms existing compression models by 8% in accuracy.

Problem

Research questions and friction points this paper is trying to address.

RAG Pipelines

Information Loss

Pre-training Duration

Innovation

Methods, ideas, or system contributions that make the work stand out.

PISCO

Document Compression

Information Retention

🔎 Similar Papers

No similar papers found.