Solving LLM Repetition Problem in Production: A Comprehensive Study of Multiple Solutions

📅 2025-12-03

📈 Citations: 0

✨ Influential: 0

career value

191K/year

🤖 AI Summary

Large language models (LLMs) deployed in production frequently suffer from degenerate repetition due to suboptimal decoding strategies, causing severe performance degradation and service unavailability. This work focuses on batched code interpretation tasks and systematically identifies three prevalent repetition patterns: business rule generation, method call relationship analysis, and PlantUML diagram syntax generation. Through Markov modeling, we reveal that greedy decoding inherently fails to escape repetitive cycles, and establish early_stopping as the critical parameter enabling effective termination in beam search. We propose a hierarchical mitigation strategy: (i) beam search with early_stopping=True universally resolves all three repetition types; (ii) presence_penalty alleviates business-rule–induced repetition; and (iii) DPO-based fine-tuning enhances model-level robustness against repetition. All methods are rigorously validated in real-world production deployments, yielding substantial improvements in system stability and inference efficiency.

Technology Category

Application Category

📝 Abstract

The repetition problem, where Large Language Models (LLMs) continuously generate repetitive content without proper termination, poses a critical challenge in production deployments, causing severe performance degradation and system stalling. This paper presents a comprehensive investigation and multiple practical solutions for the repetition problem encountered in real-world batch code interpretation tasks. We identify three distinct repetition patterns: (1) business rule generation repetition, (2) method call relationship analysis repetition, and (3) PlantUML diagram syntax generation repetition. Through rigorous theoretical analysis based on Markov models, we establish that the root cause lies in greedy decoding's inability to escape repetitive loops, exacerbated by self-reinforcement effects. Our comprehensive experimental evaluation demonstrates three viable solutions: (1) Beam Search decoding with early_stopping=True serves as a universal post-hoc mechanism that effectively resolves all three repetition patterns; (2) presence_penalty hyperparameter provides an effective solution specifically for BadCase 1; and (3) Direct Preference Optimization (DPO) fine-tuning offers a universal model-level solution for all three BadCases. The primary value of this work lies in combining first-hand production experience with extensive experimental validation. Our main contributions include systematic theoretical analysis of repetition mechanisms, comprehensive evaluation of multiple solutions with task-specific applicability mapping, identification of early_stopping as the critical parameter for Beam Search effectiveness, and practical production-ready solutions validated in real deployment environments.

Problem

Research questions and friction points this paper is trying to address.

Addresses LLM repetition in production code tasks

Identifies three specific repetition patterns in batch interpretation

Evaluates solutions like Beam Search and DPO fine-tuning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Beam Search decoding with early_stopping resolves repetition patterns

presence_penalty hyperparameter addresses business rule generation repetition

Direct Preference Optimization fine-tuning provides universal model-level solution

🔎 Similar Papers

Developing a Reliable, Fast, General-Purpose Hallucination Detection and Mitigation Service