Compiler-Guided Inference-Time Adaptation: Improving GPT-5 Programming Performance in Idris

πŸ“… 2026-02-12
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work addresses the notably weaker programming performance of GPT-5 in the low-resource functional language Idris compared to mainstream languages, primarily due to its ineffective use of local compilation and test feedback. To bridge this gap, the authors propose an iterative, compiler-feedback-driven prompting strategy that, for the first time, integrates structured compiler errors and test failure information into the large language model’s reasoning process. By incorporating error-type-guided refinement, documentation-augmented prompts, and systematic feedback integration, the approach enables adaptive self-correction by the model. Evaluated on the Exercism platform, this method boosts GPT-5’s Idris exercise pass rate from 22 out of 56 to 54 out of 56, approaching its performance in mainstream languages and substantially narrowing the capability gap for low-resource programming languages.

Technology Category

Application Category

πŸ“ Abstract
GPT-5, a state of the art large language model from OpenAI, demonstrates strong performance in widely used programming languages such as Python, C++, and Java; however, its ability to operate in low resource or less commonly used languages remains underexplored. This work investigates whether GPT-5 can effectively acquire proficiency in an unfamiliar functional programming language, Idris, through iterative, feedback driven prompting. We first establish a baseline showing that with zero shot prompting the model solves only 22 out of 56 Idris exercises using the platform Exercism, substantially underperforming relative to higher resource languages (45 out of 50 in Python and 35 out of 47 in Erlang). We then evaluate several refinement strategies, including iterative prompting based on platform feedback, augmenting prompts with documentation and error classification guides, and iterative prompting using local compilation errors and failed test cases. Among these approaches, incorporating local compilation errors yields the most substantial improvements. Using this structured, error guided refinement loop, GPT-5 performance increased to an impressive 54 solved problems out of 56. These results suggest that while large language models may initially struggle in low resource settings, structured compiler level feedback can play a critical role in unlocking their capabilities.
Problem

Research questions and friction points this paper is trying to address.

large language models
low-resource languages
programming performance
compiler feedback
Idris
Innovation

Methods, ideas, or system contributions that make the work stand out.

compiler-guided adaptation
inference-time refinement
large language models
low-resource programming languages
iterative prompting
πŸ”Ž Similar Papers
No similar papers found.
M
Minda Li
Ming Hsieh Department of Electrical and Computer Engineering, Viterbi School of Engineering, University of Southern California, Los Angeles, California
Bhaskar Krishnamachari
Bhaskar Krishnamachari
Professor of Electrical and Computer Engineering, and Computer Science, USC
Internet of ThingsAIMachine LearningBlockchainConnected Vehicles