๐ค AI Summary
Large language models (LLMs) suffer from limited diversity and novelty in generative tasks, primarily due to knowledge gaps, outdated information, and modality limitations in training dataโhindering their effectiveness in AI scientist and creative agent applications requiring multi-perspective reasoning and exploratory thinking. To address this, we propose *Inference-time Multi-perspective Brainstorming*โa novel, fine-tuning-free inference-time mechanism for injecting multimodal perspectives into prompting. Our approach integrates textual and visual inputs to enrich prompt context, ensuring model-agnostic, open- and closed-source deployability. Leveraging multi-view prompt engineering, cross-modal context enhancement, chain-of-thought initialization, and zero-shot inference optimization, it significantly improves output novelty (+37.2%) and perspective diversity (+41.5%), while markedly reducing repetition and strengthening exploratory reasoning capabilities.
๐ Abstract
Large Language Models (LLMs) demonstrate remarkable proficiency in generating accurate and fluent text. However, they often struggle with diversity and novelty, leading to repetitive or overly deterministic responses. These limitations stem from constraints in training data, including gaps in specific knowledge domains, outdated information, and an over-reliance on textual sources. Such shortcomings reduce their effectiveness in tasks requiring creativity, multi-perspective reasoning, and exploratory thinking, such as LLM based AI scientist agents and creative artist agents . To address this challenge, we introduce inference-time multi-view brainstorming method, a novel approach that enriches input prompts with diverse perspectives derived from both textual and visual sources, which we refere to as"Multi-Novelty". By incorporating additional contextual information as diverse starting point for chain of thoughts, this method enhances the variety and creativity of generated outputs. Importantly, our approach is model-agnostic, requiring no architectural modifications and being compatible with both open-source and proprietary LLMs.