Decoupled Reasoning with Implicit Fact Tokens (DRIFT): A Dual-Model Framework for Efficient Long-Context Inference

📅 2026-02-10

📈 Citations: 0

✨ Influential: 0

career value

160K/year

🤖 AI Summary

This work addresses the limitations of large language models in handling dynamic knowledge, which stem from constrained context lengths, retrieval noise, and catastrophic forgetting, thereby hindering effective decoupling of factual memory from reasoning. To overcome this, the authors propose DRIFT, a dual-model framework in which a lightweight knowledge model dynamically compresses external documents into query-oriented implicit fact tokens. These tokens are then injected into the embedding space of a separate reasoning model, achieving explicit decoupling of knowledge extraction and reasoning for the first time. By replacing raw text with implicit tokens, DRIFT substantially extends the effective context length while preserving reasoning accuracy. Experimental results demonstrate that DRIFT significantly outperforms strong same-scale baselines on long-context tasks, offering both higher inference efficiency and improved accuracy.

Technology Category

Application Category

📝 Abstract

The integration of extensive, dynamic knowledge into Large Language Models (LLMs) remains a significant challenge due to the inherent entanglement of factual data and reasoning patterns. Existing solutions, ranging from non-parametric Retrieval-Augmented Generation (RAG) to parametric knowledge editing, are often constrained in practice by finite context windows, retriever noise, or the risk of catastrophic forgetting. In this paper, we propose DRIFT, a novel dual-model architecture designed to explicitly decouple knowledge extraction from the reasoning process. Unlike static prompt compression, DRIFT employs a lightweight knowledge model to dynamically compress document chunks into implicit fact tokens conditioned on the query. These dense representations are projected into the reasoning model's embedding space, replacing raw, redundant text while maintaining inference accuracy. Extensive experiments show that DRIFT significantly improves performance on long-context tasks, outperforming strong baselines among comparably sized models. Our approach provides a scalable and efficient paradigm for extending the effective context window and reasoning capabilities of LLMs. Our code is available at https://github.com/Lancelot-Xie/DRIFT.

Problem

Research questions and friction points this paper is trying to address.

long-context inference

knowledge integration

reasoning decoupling

factual consistency

context window limitation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Decoupled Reasoning

Implicit Fact Tokens

Dual-Model Architecture