π€ AI Summary
This study investigates the internal mechanisms of large language models (LLMs) in scientific reasoning tasks and their dependence on prompting, aiming to enhance model interpretability and safety. To this end, the work proposes prompt optimization as an interpretability tool and introduces a tailored Genetic Evolutionary Pareto Algorithm (GEPA) to systematically optimize scientific reasoning prompts. Combining behavioral analysis with cross-model transfer evaluations, the research reveals that LLMs often rely on βlocal logicββmodel-specific reasoning heuristics that are difficult to generalize across architectures. The findings indicate that performance gains frequently stem from such idiosyncratic structural features, underscoring the importance of characterizing individual model reasoning mechanisms. This insight opens new pathways toward developing safer, more controllable AI systems grounded in a deeper understanding of model-specific behaviors.
π Abstract
As Large Language Models (LLMs) achieve increasingly sophisticated performance on complex reasoning tasks, current architectures serve as critical proxies for the internal heuristics of frontier models. Characterizing emergent reasoning is vital for long-term interpretability and safety. Furthermore, understanding how prompting modulates these processes is essential, as natural language will likely be the primary interface for interacting with AGI systems. In this work, we use a custom variant of Genetic Pareto (GEPA) to systematically optimize prompts for scientific reasoning tasks, and analyze how prompting can affect reasoning behavior. We investigate the structural patterns and logical heuristics inherent in GEPA-optimized prompts, and evaluate their transferability and brittleness. Our findings reveal that gains in scientific reasoning often correspond to model-specific heuristics that fail to generalize across systems, which we call "local" logic. By framing prompt optimization as a tool for model interpretability, we argue that mapping these preferred reasoning structures for LLMs is an important prerequisite for effectively collaborating with superhuman intelligence.