Understanding the Repeat Curse in Large Language Models from a Feature Perspective

📅 2025-04-19

📈 Citations: 0

✨ Influential: 0

career value

188K/year

🤖 AI Summary

Large language models (LLMs) suffer from the “repetition curse,” yet its underlying mechanisms remain poorly understood. This paper introduces Duplicatus Charm—the first method grounded in explainable AI (XAI) to systematically identify and validate neural features causally responsible for repetitive text generation (termed Repetition Features). Leveraging a synergistic pipeline comprising sparse autoencoders (SAEs), logit attribution analysis, targeted activation ablation, and a custom-constructed, quantitatively evaluated repetition dataset, we precisely localize repetition-sensitive layers and features. Empirical results demonstrate that ablating these critical features significantly mitigates repetition, with consistent efficacy across diverse LLMs. Crucially, this work establishes, for the first time at the feature level, a causal explanation of repetition phenomena—demonstrating both their interpretability and controllability. Our findings provide foundational theoretical insights and actionable levers for designing robust decoding strategies and performing targeted model editing. (149 words)

Technology Category

Application Category

📝 Abstract

Large language models (LLMs) have made remarkable progress in various domains, yet they often suffer from repetitive text generation, a phenomenon we refer to as the"Repeat Curse". While previous studies have proposed decoding strategies to mitigate repetition, the underlying mechanism behind this issue remains insufficiently explored. In this work, we investigate the root causes of repetition in LLMs through the lens of mechanistic interpretability. Inspired by recent advances in Sparse Autoencoders (SAEs), which enable monosemantic feature extraction, we propose a novel approach,"Duplicatus Charm", to induce and analyze the Repeat Curse. Our method systematically identifies"Repetition Features"-the key model activations responsible for generating repetitive outputs. First, we locate the layers most involved in repetition through logit analysis. Next, we extract and stimulate relevant features using SAE-based activation manipulation. To validate our approach, we construct a repetition dataset covering token and paragraph level repetitions and introduce an evaluation pipeline to quantify the influence of identified repetition features. Furthermore, by deactivating these features, we have effectively mitigated the Repeat Curse.

Problem

Research questions and friction points this paper is trying to address.

Investigates root causes of repetitive text generation in LLMs

Identifies key model activations causing repetitive outputs

Proposes method to mitigate repetition by deactivating specific features

Innovation

Methods, ideas, or system contributions that make the work stand out.

Mechanistic interpretability to analyze repetition

Sparse Autoencoders for feature extraction

Activation manipulation to mitigate repetition

🔎 Similar Papers

Predicting and analyzing memorization within fine-tuned Large Language Models