Context Tokens are Anchors: Understanding the Repetition Curse in dMLLMs from an Information Flow Perspective

📅 2026-01-28

📈 Citations: 0

✨ Influential: 0

career value

164K/year

🤖 AI Summary

This work addresses the "curse of repetition" commonly observed in diffusion-based multimodal large language models (dMLLMs) during cache-accelerated inference. From an information flow perspective, the study reveals that context tokens serve as semantic anchors whose information entropy should converge in deeper network layers; repetitive generation arises when perturbations disrupt this convergence. To mitigate this issue, the authors propose CoTA, a plug-and-play method that enhances the attention mechanism to better preserve contextual information and introduces a confidence-aware penalty during decoding to suppress repetitions triggered by uncertain context representations. Experimental results demonstrate that CoTA effectively alleviates repetitive outputs while consistently improving performance across general tasks.

Technology Category

Application Category

📝 Abstract

Recent diffusion-based Multimodal Large Language Models (dMLLMs) suffer from high inference latency and therefore rely on caching techniques to accelerate decoding. However, the application of cache mechanisms often introduces undesirable repetitive text generation, a phenomenon we term the \textbf{Repeat Curse}. To better investigate underlying mechanism behind this issue, we analyze repetition generation through the lens of information flow. Our work reveals three key findings: (1) context tokens aggregate semantic information as anchors and guide the final predictions; (2) as information propagates across layers, the entropy of context tokens converges in deeper layers, reflecting the model's growing prediction certainty; (3) Repetition is typically linked to disruptions in the information flow of context tokens and to the inability of their entropy to converge in deeper layers. Based on these insights, we present \textbf{CoTA}, a plug-and-play method for mitigating repetition. CoTA enhances the attention of context tokens to preserve intrinsic information flow patterns, while introducing a penalty term to the confidence score during decoding to avoid outputs driven by uncertain context tokens. With extensive experiments, CoTA demonstrates significant effectiveness in alleviating repetition and achieves consistent performance improvements on general tasks. Code is available at https://github.com/ErikZ719/CoTA

Problem

Research questions and friction points this paper is trying to address.

repetition

dMLLMs

information flow

context tokens

cache mechanism

Innovation

Methods, ideas, or system contributions that make the work stand out.

Context Tokens

Information Flow

Repetition Curse