🤖 AI Summary
This work addresses the challenges of automatically migrating legacy code to high-performance machine learning domain-specific languages (DSLs)—a task traditionally hindered by heavy reliance on manual heuristics and poor scalability. We propose an LLM-driven, probabilistic grammar-guided enumerative synthesis method. Our approach uniquely leverages large language models to automatically learn and model domain-specific migration rules, encoding them as probabilistic context-free grammars (PCFGs) to eliminate manual rule design and hard-coding. Integrated with efficient enumerative synthesis and DSL compiler optimizations, our method achieves significant improvements over existing state-of-the-art tools across multiple benchmarks: it delivers superior correctness rates, markedly accelerates inference, and demonstrates strong generalization—without any human intervention throughout the process.
📝 Abstract
Domain-specific languages (DSLs) for machine learning are revolutionizing the speed and efficiency of machine learning workloads as they enable users easy access to high-performance compiler optimizations and accelerators. However, to take advantage of these capabilities, a user must first translate their legacy code from the language it is currently written in, into the new DSL. The process of automatically lifting code into these DSLs has been identified by several recent works, which propose program synthesis as a solution. However, synthesis is expensive and struggles to scale without carefully designed and hard-wired heuristics. In this paper, we present an approach for lifting that combines an enumerative synthesis approach with a Large Language Model used to automatically learn the domain-specific heuristics for program lifting, in the form of a probabilistic grammar. Our approach outperforms the state-of-the-art tools in this area, despite only using learned heuristics.