🤖 AI Summary
Large language models (LLMs) suffer from insufficient output diversity—repeated sampling yields highly redundant generations, hindering applications requiring creativity or reasoning. This paper proposes G2, a training-free, plug-and-play decoding intervention method that jointly employs a base generator and a dual-guidance module to integrate prompt alignment and distributional control during decoding, thereby decoupling diversity enhancement from quality optimization. Its core innovation is a dual-Guide mechanism: one guide enforces semantic consistency with the input prompt, while the other regulates output distribution entropy—enabling adaptation to any pretrained LLM without fine-tuning. Experiments across diverse generation tasks demonstrate that G2 significantly improves diversity (e.g., reducing Self-BLEU by 23.6%) while preserving fluency and semantic fidelity. G2 consistently outperforms standard baselines—including temperature scaling—across all evaluated metrics.
📝 Abstract
Large Language Models (LLMs) have demonstrated exceptional performance across diverse natural language processing tasks. However, these models exhibit a critical limitation in output diversity, often generating highly similar content across multiple attempts. This limitation significantly affects tasks requiring diverse outputs, from creative writing to reasoning. Existing solutions, like temperature scaling, enhance diversity by modifying probability distributions but compromise output quality. We propose Guide-to-Generation (G2), a training-free plug-and-play method that enhances output diversity while preserving generation quality. G2 employs a base generator alongside dual Guides, which guide the generation process through decoding-based interventions to encourage more diverse outputs conditioned on the original query. Comprehensive experiments demonstrate that G2 effectively improves output diversity while maintaining an optimal balance between diversity and quality.