π€ AI Summary
Current evaluations of large language models are largely confined to single-turn text generation, failing to adequately assess agentsβ divergent thinking in interactive settings. To address this limitation, this work introduces MUTATE, an interactive benchmark that, for the first time, quantifies divergent thinking at both the path and action levels. The authors propose ReDNA, a two-stage reasoning framework that decouples unconstrained divergent generation from convergent selection, thereby mitigating action fixation caused by premature convergence pressure. By integrating mechanism-switching action modeling with multi-path exploration scoring, ReDNA substantially outperforms existing approaches on MUTATE and external creative tasks, demonstrating its capacity to achieve qualitative gains in creativity through resilient divergent reasoning.
π Abstract
Divergent thinking is a core dimension of creativity, yet existing evaluations of Large Language Models (LLMs) treat them as single-turn text generations, failing to capture how an agent reasons through iterative interaction. To address this, we introduce MUTATE, an interactive benchmark designed to evaluate agentic divergent thinking at two levels: path-level, where an agent discovers multiple alternative paths to the same goal, and action-level, where individual actions require non-typical, mechanism-shifting object uses. Unlike success-only evaluations, MUTATE scores both completed paths and off-path attempts, capturing divergent reasoning that conventional success rates discard. Our experiments with frontier LLMs reveal a structural blind spot in existing frameworks: when exposed to immediate convergence pressure, they tend to fall into immediate action fixation, failing to improve action-level divergence. To overcome this, we propose ReDNA, which separates unconstrained divergent candidate generation from convergent constraint selection. ReDNA significantly outperforms prior methods across both divergence levels and generalizes effectively to an external creativity environment. We also confirm its success stems from a qualitative enhancement of resilient divergent reasoning rather than simple environmental exploration.