🤖 AI Summary
To address the low generation efficiency and difficulty of dynamic knowledge adaptation in large language models (LLMs), this paper proposes Chunk-wise Distillation Language Modeling (CD-LM). CD-LM enables single-step generation of multi-token text chunks via retrieval augmentation, balancing inference speed and real-time knowledge updating. Its core innovation lies in the first integration of a lightweight retrieval module with a deep LLM for chunk-level generation—enabling zero-shot, fine-tuning-free customization of model behavior and domain-specific knowledge injection, while improving controllability over output distributions. Training data are constructed via a dual-path strategy combining LLM-generated samples and human annotation, ensuring both effective internal knowledge extraction and high-quality supervision. Experiments demonstrate that CD-LM significantly improves generation efficiency and accuracy across multiple downstream tasks, achieving a superior trade-off among performance, adaptability, and computational cost.
📝 Abstract
We introduce Chunk-Distilled Language Modeling (CD-LM), an approach to text generation that addresses two challenges in current large language models (LLMs): the inefficiency of token-level generation, and the difficulty of adapting to new data and knowledge. Our method combines deep network-based LLMs with a straightforward retrieval module, which allows the generation of multi-token text chunks at a single decoding step. Our retrieval framework enables flexible construction of model- or domain-specific datastores, either leveraging the internal knowledge of existing models, or incorporating expert insights from human-annotated corpora. This adaptability allows for enhanced control over the language model's distribution without necessitating additional training. We present the CD-LM formulation along with performance metrics demonstrating its ability to improve language model performance and efficiency across a diverse set of downstream tasks. Code and data will be made publicly available.