🤖 AI Summary
Existing program synthesis methods exhibit limited robustness under few-shot training and on out-of-distribution (OOD) test inputs. To address this, we propose a test-time transductive program synthesis framework that reformulates synthesis as an active learning process over a finite hypothesis space. Our approach dynamically prunes candidate programs using test inputs, initializes hypotheses via large language models, selects maximally informative queries using a greedy max-min algorithm, and employs an input-output consistency–based hypothesis resolution mechanism for efficient search. This design significantly improves generalization to edge cases and synthesis robustness. Empirical evaluation on the Playgol and MBPP+ benchmarks demonstrates superior accuracy and query efficiency compared to state-of-the-art methods.
📝 Abstract
We introduce transductive program synthesis, a new formulation of the program synthesis task that explicitly leverages test inputs during synthesis. While prior approaches to program synthesis--whether based on natural language descriptions or input-output examples--typically aim to generalize from training examples, they often struggle with robustness, especially in real-world settings where training examples are limited and test inputs involve various edge cases. To address this, we propose a novel framework that improves robustness by treating synthesis as an active learning over a finite hypothesis class defined by programs' outputs. We use an LLM to predict outputs for selected test inputs and eliminate inconsistent hypotheses, where the inputs are chosen via a greedy maximin algorithm to minimize the number of LLM queries required. We evaluate our approach on two real-world datasets: Playgol, a string transformation benchmark, and MBPP+, a Python code generation benchmark. We demonstrate that our method significantly improves program synthesis in both accuracy and efficiency. We release our code at https://github.com/klee972/SYNTRA.