🤖 AI Summary
This work challenges the foundational assumption in program synthesis that “humans can adequately teach” by investigating whether non-expert humans can effectively instruct machines to program via input-output examples. Method: We conduct a human-subjects study across six fundamental programming concepts, collecting example sets from both non-experts and experts, and systematically evaluate the generalization performance of five state-of-the-art synthesizers—including search-based and neural-guided approaches—under varying example sources. Contribution/Results: Non-expert examples exhibit significantly lower quality, yielding substantially lower average accuracy than expert examples or random sampling; only 35% of non-expert task sets are successfully generalized by any synthesizer. To our knowledge, this is the first empirical study to demonstrate that non-expert examples are generally insufficient for reliable program synthesis. Our findings provide critical empirical grounding for human-centered modeling and system design in interactive program synthesis, highlighting concrete directions for improvement.
📝 Abstract
The goal of inductive program synthesis is for a machine to automatically generate a program from user-supplied examples. A key underlying assumption is that humans can provide sufficient examples to teach a concept to a machine. To evaluate the validity of this assumption, we conduct a study where human participants provide examples for six programming concepts, such as finding the maximum element of a list. We evaluate the generalisation performance of five program synthesis systems trained on input-output examples (i) from non-expert humans, (ii) from a human expert, and (iii) randomly sampled. Our results suggest that non-experts typically do not provide sufficient examples for a program synthesis system to learn an accurate program.