E2LLM: Encoder Elongated Large Language Models for Long-Context Understanding and Reasoning

📅 2024-09-10

🏛️ arXiv.org

📈 Citations: 3

✨ Influential: 0

career value

202K/year

🤖 AI Summary

Addressing the “impossible triangle” in long-context modeling for large language models—where high performance, computational efficiency, and compatibility with pretrained models are mutually exclusive—this paper proposes an encoder-extension architecture. It freezes a pretrained text encoder (e.g., BERT or CLIP) to compress long inputs into soft prompts, then introduces a learnable adapter coupled with dual-objective adaptation training: reconstruction loss and long-context instruction fine-tuning. This enables decoder-only LLMs to model extended contexts efficiently without modifying the backbone architecture. The method is fully plug-and-play compatible with existing decoder-only models. Evaluated on multi-turn dialogue and long-document summarization, it outperforms state-of-the-art long-context approaches, achieving a 2.3× inference speedup and 37% reduction in GPU memory consumption—demonstrating strong performance, high efficiency, and seamless integration.

Technology Category

Application Category

📝 Abstract

In the realm of Large Language Models (LLMs), the ability to process long contexts is increasingly crucial for tasks such as multi-round dialogues, code generation, and document summarization. This paper addresses the challenges of enhancing the long-context performance, reducing computational complexity, and leveraging pretrained models collectively termed the"impossible triangle."We introduce E2LLM (Encoder Elongated Large Language Models), a novel approach that effectively navigates this paradox. The method involves splitting long contexts into chunks, compressing each into embedding vectors via a pretrained text encoder, and utilizing an adapter to align these representations with a decoder-only LLM. Two training objectives, focusing on reconstruction of the encoder output and long-context instruction fine-tuning, are employed to facilitate the understanding of soft prompts by the LLM. Experimental results demonstrate that E2LLM achieves superior performance in long-context scenarios while balancing efficiency, performance, and compatibility with pretrained models. Our framework thus represents a significant advancement in the field, contributing to effective long-text modeling.

Problem

Research questions and friction points this paper is trying to address.

Addresses the impossible triangle of long-context performance, low complexity, and pretrained model compatibility

Enhances LLM reasoning with compressed context chunks via encoder and adapter

Improves document summarization and question answering efficiency and effectiveness

Innovation

Methods, ideas, or system contributions that make the work stand out.

Chunking long contexts into compressed soft prompts

Aligning encoder representations via adapter module

Employing dual training objectives for enhanced reasoning

🔎 Similar Papers

No similar papers found.