🤖 AI Summary
Existing sequence modeling approaches struggle to efficiently decouple compositional reasoning from local static knowledge retrieval and exhibit limited generalization to non-textual modalities. This work proposes a latent-space conditional memory module that learns discrete symbols from hidden states and performs N-gram lookups, enabling tokenizer-agnostic knowledge retrieval. By operating independently of tokenizer IDs, the method supports multimodal extension and allows for post-hoc injection of domain-specific knowledge into pretrained models. Experiments demonstrate consistent reductions in perplexity on long-context language modeling, outperforming both Transformer and Engram baselines. The approach also yields overall performance gains on vision–language and action-related tasks while maintaining low inference latency and memory overhead.
📝 Abstract
Sequence modeling requires both compositional reasoning and local static knowledge retrieval, yet standard Transformers handle both through dense computation. Engram partially decouples retrieval from the backbone, but its token-based keys remain tied to text tokenization and hash compression. We propose Lngram, a latent-space conditional memory module that learns discrete symbols directly from hidden states and performs N-gram lookup over these symbols. This design removes the dependence on tokenizer IDs and naturally extends to non-text modalities. In our evaluated settings, Lngram outperforms Transformer and Engram baselines, consistently reduces perplexity in long-context language modeling, and effectively injects domain knowledge when added post hoc to pretrained models. Joint training with the backbone further surpasses full fine-tuning, while experiments on vision-language and vision-language-action tasks show overall gains. Analyses with LogitLens and CKA suggest that Lngram enables prediction-relevant information to emerge earlier, increasing effective depth with limited inference and memory overhead. Code is available at https://github.com/zyaaa-ux/Lngram.