Chunk-Distilled Language Modeling

📅 2024-12-31
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the low generation efficiency and difficulty of dynamic knowledge adaptation in large language models (LLMs), this paper proposes Chunk-wise Distillation Language Modeling (CD-LM). CD-LM enables single-step generation of multi-token text chunks via retrieval augmentation, balancing inference speed and real-time knowledge updating. Its core innovation lies in the first integration of a lightweight retrieval module with a deep LLM for chunk-level generation—enabling zero-shot, fine-tuning-free customization of model behavior and domain-specific knowledge injection, while improving controllability over output distributions. Training data are constructed via a dual-path strategy combining LLM-generated samples and human annotation, ensuring both effective internal knowledge extraction and high-quality supervision. Experiments demonstrate that CD-LM significantly improves generation efficiency and accuracy across multiple downstream tasks, achieving a superior trade-off among performance, adaptability, and computational cost.

Technology Category

Application Category

📝 Abstract
We introduce Chunk-Distilled Language Modeling (CD-LM), an approach to text generation that addresses two challenges in current large language models (LLMs): the inefficiency of token-level generation, and the difficulty of adapting to new data and knowledge. Our method combines deep network-based LLMs with a straightforward retrieval module, which allows the generation of multi-token text chunks at a single decoding step. Our retrieval framework enables flexible construction of model- or domain-specific datastores, either leveraging the internal knowledge of existing models, or incorporating expert insights from human-annotated corpora. This adaptability allows for enhanced control over the language model's distribution without necessitating additional training. We present the CD-LM formulation along with performance metrics demonstrating its ability to improve language model performance and efficiency across a diverse set of downstream tasks. Code and data will be made publicly available.
Problem

Research questions and friction points this paper is trying to address.

Large Language Models
Generation Efficiency
Adaptability to New Information
Innovation

Methods, ideas, or system contributions that make the work stand out.

Block Distillation Language Modeling
Efficiency Enhancement
Expert Knowledge Integration
🔎 Similar Papers
No similar papers found.
Y
Yanhong Li
University of Chicago & TTIC
Karen Livescu
Karen Livescu
TTI-Chicago
speech and language processingmachine learning
J
Jiawei Zhou
TTIC & Stony Brook University