E2LLM: Encoder Elongated Large Language Models for Long-Context Understanding and Reasoning

📅 2024-09-10
🏛️ arXiv.org
📈 Citations: 3
Influential: 0
📄 PDF
🤖 AI Summary
Addressing the “impossible triangle” in long-context modeling for large language models—where high performance, computational efficiency, and compatibility with pretrained models are mutually exclusive—this paper proposes an encoder-extension architecture. It freezes a pretrained text encoder (e.g., BERT or CLIP) to compress long inputs into soft prompts, then introduces a learnable adapter coupled with dual-objective adaptation training: reconstruction loss and long-context instruction fine-tuning. This enables decoder-only LLMs to model extended contexts efficiently without modifying the backbone architecture. The method is fully plug-and-play compatible with existing decoder-only models. Evaluated on multi-turn dialogue and long-document summarization, it outperforms state-of-the-art long-context approaches, achieving a 2.3× inference speedup and 37% reduction in GPU memory consumption—demonstrating strong performance, high efficiency, and seamless integration.

Technology Category

Application Category

📝 Abstract
In the realm of Large Language Models (LLMs), the ability to process long contexts is increasingly crucial for tasks such as multi-round dialogues, code generation, and document summarization. This paper addresses the challenges of enhancing the long-context performance, reducing computational complexity, and leveraging pretrained models collectively termed the"impossible triangle."We introduce E2LLM (Encoder Elongated Large Language Models), a novel approach that effectively navigates this paradox. The method involves splitting long contexts into chunks, compressing each into embedding vectors via a pretrained text encoder, and utilizing an adapter to align these representations with a decoder-only LLM. Two training objectives, focusing on reconstruction of the encoder output and long-context instruction fine-tuning, are employed to facilitate the understanding of soft prompts by the LLM. Experimental results demonstrate that E2LLM achieves superior performance in long-context scenarios while balancing efficiency, performance, and compatibility with pretrained models. Our framework thus represents a significant advancement in the field, contributing to effective long-text modeling.
Problem

Research questions and friction points this paper is trying to address.

Addresses the impossible triangle of long-context performance, low complexity, and pretrained model compatibility
Enhances LLM reasoning with compressed context chunks via encoder and adapter
Improves document summarization and question answering efficiency and effectiveness
Innovation

Methods, ideas, or system contributions that make the work stand out.

Chunking long contexts into compressed soft prompts
Aligning encoder representations via adapter module
Employing dual training objectives for enhanced reasoning
🔎 Similar Papers
No similar papers found.