Can humans teach machines to code?

📅 2024-04-30

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

224K/year

🤖 AI Summary

This work challenges the foundational assumption in program synthesis that “humans can adequately teach” by investigating whether non-expert humans can effectively instruct machines to program via input-output examples. Method: We conduct a human-subjects study across six fundamental programming concepts, collecting example sets from both non-experts and experts, and systematically evaluate the generalization performance of five state-of-the-art synthesizers—including search-based and neural-guided approaches—under varying example sources. Contribution/Results: Non-expert examples exhibit significantly lower quality, yielding substantially lower average accuracy than expert examples or random sampling; only 35% of non-expert task sets are successfully generalized by any synthesizer. To our knowledge, this is the first empirical study to demonstrate that non-expert examples are generally insufficient for reliable program synthesis. Our findings provide critical empirical grounding for human-centered modeling and system design in interactive program synthesis, highlighting concrete directions for improvement.

Technology Category

Application Category

📝 Abstract

The goal of inductive program synthesis is for a machine to automatically generate a program from user-supplied examples. A key underlying assumption is that humans can provide sufficient examples to teach a concept to a machine. To evaluate the validity of this assumption, we conduct a study where human participants provide examples for six programming concepts, such as finding the maximum element of a list. We evaluate the generalisation performance of five program synthesis systems trained on input-output examples (i) from non-expert humans, (ii) from a human expert, and (iii) randomly sampled. Our results suggest that non-experts typically do not provide sufficient examples for a program synthesis system to learn an accurate program.

Problem

Research questions and friction points this paper is trying to address.

Inductive program synthesis from examples

Human-taught machine coding evaluation

Non-expert example sufficiency assessment

Innovation

Methods, ideas, or system contributions that make the work stand out.

Inductive program synthesis

Human-provided examples

Generalization performance evaluation

🔎 Similar Papers

Is The Watermarking Of LLM-Generated Code Robust?