RuleFlow : Generating Reusable Program Optimizations with LLMs

📅 2026-02-06

📈 Citations: 0

✨ Influential: 0

career value

164K/year

🤖 AI Summary

Existing compilers offer limited support for optimizing Pandas programs, while directly employing large language models (LLMs) for per-program optimization is both costly and unreliable. This work proposes a three-stage hybrid approach: first leveraging an LLM to discover specific optimization instances, then generalizing these into reusable rewrite rules through program synthesis, and finally integrating the rules into a compiler for automatic application. This method represents the first effort to transform LLM-generated optimizations into compiler-embeddable, reusable rules, effectively decoupling optimization discovery from deployment and thereby achieving both flexibility and reliability. Evaluated on PandasBench, the approach achieves speedups of up to 4.3× over the state-of-the-art compiler Dias and up to 1914.9× over the system-level solution Modin.

Technology Category

Application Category

📝 Abstract

Optimizing Pandas programs is a challenging problem. Existing systems and compiler-based approaches offer reliability but are either heavyweight or support only a limited set of optimizations. Conversely, using LLMs in a per-program optimization methodology can synthesize nontrivial optimizations, but is unreliable, expensive, and offers a low yield. In this work, we introduce a hybrid approach that works in a 3-stage manner that decouples discovery from deployment and connects them via a novel bridge. First, it discovers per-program optimizations (discovery). Second, they are converted into generalised rewrite rules (bridge). Finally, these rules are incorporated into a compiler that can automatically apply them wherever applicable, eliminating repeated reliance on LLMs (deployment). We demonstrate that RuleFlow is the new state-of-the-art (SOTA) Pandas optimization framework on PandasBench, a challenging Pandas benchmark consisting of Python notebooks. Across these notebooks, we achieve a speedup of up to 4.3x over Dias, the previous compiler-based SOTA, and 1914.9x over Modin, the previous systems-based SOTA. Our code is available at https://github.com/ADAPT-uiuc/RuleFlow.

Problem

Research questions and friction points this paper is trying to address.

Pandas optimization

program optimization

large language models

compiler-based optimization

rewrite rules

Innovation

Methods, ideas, or system contributions that make the work stand out.

RuleFlow

program optimization

rewrite rules