🤖 AI Summary
This work addresses the limited capability of large language models (LLMs) to reason and optimize under physical and operational constraints by proposing the first systematic evaluation framework tailored for optimal power flow (OPF) tasks. The framework employs structured inputs, multi-skill task design, and quantitative metrics to comprehensively assess mainstream LLMs on core competencies including arithmetic computation, constraint modeling, and optimization reasoning. Experimental results demonstrate that even state-of-the-art LLMs perform poorly on most OPF tasks, with significant degradation under high-complexity constraints. These findings reveal critical limitations in applying current LLMs to real-world power system optimization scenarios and establish a foundational benchmark to guide future research directions.
📝 Abstract
Large Language Models (LLMs) have demonstrated great capabilities across diverse natural language tasks; yet their ability to solve abstraction and optimization problems with constraints remains scarcely explored. In this paper, we investigate whether LLMs can reason and optimize under the physical and operational constraints of Optimal Power Flow (OPF) problem. We introduce a challenging evaluation setup that requires a set of fundamental skills such as reasoning, structured input handling, arithmetic, and constrained optimization. Our evaluation reveals that SoTA LLMs fail in most of the tasks, and that reasoning LLMs still fail in the most complex settings. Our findings highlight critical gaps in LLMs' ability to handle structured reasoning under constraints, and this work provides a rigorous testing environment for developing more capable LLM assistants that can tackle real-world power grid optimization problems.