PISCO: Pretty Simple Compression for Retrieval-Augmented Generation

📅 2025-01-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the high inference overhead and context-length limitations in Retrieval-Augmented Generation (RAG) systems caused by long documents, this paper proposes a lightweight, pretraining-free, and annotation-free document compression method. We introduce a novel question-answer (QA)-driven, sequence-level knowledge distillation framework that enables end-to-end, unsupervised soft compression of documents. Our approach achieves a 16× compression ratio with only 0–3% accuracy degradation and supports full fine-tuning of 7–10B LLMs on a single GPU within 48 hours. On multi-source RAG QA benchmarks, it improves answer accuracy by 8% over existing compression baselines, substantially reducing computational cost and enhancing system scalability. The core contribution lies in the first use of QA signals—rather than token-level or sentence-level proxies—as direct, sequence-level supervision for knowledge distillation, thereby avoiding information loss and eliminating reliance on labeled data or task-specific training pipelines.

Technology Category

Application Category

📝 Abstract
Retrieval-Augmented Generation (RAG) pipelines enhance Large Language Models (LLMs) by retrieving relevant documents, but they face scalability issues due to high inference costs and limited context size. Document compression is a practical solution, but current soft compression methods suffer from accuracy losses and require extensive pretraining. In this paper, we introduce PISCO, a novel method that achieves a 16x compression rate with minimal accuracy loss (0-3%) across diverse RAG-based question-answering (QA) tasks. Unlike existing approaches, PISCO requires no pretraining or annotated data, relying solely on sequence-level knowledge distillation from document-based questions. With the ability to fine-tune a 7-10B LLM in 48 hours on a single A100 GPU, PISCO offers a highly efficient and scalable solution. We present comprehensive experiments showing that PISCO outperforms existing compression models by 8% in accuracy.
Problem

Research questions and friction points this paper is trying to address.

RAG Pipelines
Information Loss
Pre-training Duration
Innovation

Methods, ideas, or system contributions that make the work stand out.

PISCO
Document Compression
Information Retention
🔎 Similar Papers
No similar papers found.