ACE-RLHF: Automated Code Evaluation and Socratic Feedback Generation Tool using Large Language Models and Reinforcement Learning with Human Feedback

📅 2025-04-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Automated feedback in beginner programming education suffers from poor interpretability and limited pedagogical guidance. Method: This paper proposes a Socratic code feedback generation framework based on Reinforcement Learning from Human Feedback (RLHF), uniquely coupling RLHF with automated code evaluation to steer open-source large language models (e.g., Llama-3-7B, GPT-3.5) toward generating pedagogically effective questions and hints—not direct corrections. Contribution/Results: We introduce the first competition-grade benchmark tailored for programming education evaluation and integrate dual state-of-the-art optimization strategies—Proximal Policy Optimization (PPO) and Best-of-n sampling—alongside AI-generated feedback (RLAIF). Experiments show: (i) automated evaluation accuracy improves by 2–5% over non-RL baselines; (ii) human evaluation indicates nearly 40% improvement in feedback quality for GPT-3.5 under Best-of-n; and (iii) our approach achieves SOTA performance on both foundational and competitive programming datasets.

Technology Category

Application Category

📝 Abstract
Automated Program Repair tools are developed for generating feedback and suggesting a repair method for erroneous code. State of the art (SOTA) code repair methods rely on data-driven approaches and often fail to deliver solution for complicated programming questions. To interpret the natural language of unprecedented programming problems, using Large Language Models (LLMs) for code-feedback generation is crucial. LLMs generate more comprehensible feedback than compiler-generated error messages, and Reinforcement Learning with Human Feedback (RLHF) further enhances quality by integrating human-in-the-loop which helps novice students to lean programming from scratch interactively. We are applying RLHF fine-tuning technique for an expected Socratic response such as a question with hint to solve the programming issue. We are proposing code feedback generation tool by fine-tuning LLM with RLHF, Automated Code Evaluation with RLHF (ACE-RLHF), combining two open-source LLM models with two different SOTA optimization techniques. The quality of feedback is evaluated on two benchmark datasets containing basic and competition-level programming questions where the later is proposed by us. We achieved 2-5% higher accuracy than RL-free SOTA techniques using Llama-3-7B-Proximal-policy optimization in automated evaluation and similar or slightly higher accuracy compared to reward model-free RL with AI Feedback (RLAIF). We achieved almost 40% higher accuracy with GPT-3.5 Best-of-n optimization while performing manual evaluation.
Problem

Research questions and friction points this paper is trying to address.

Improving code repair for complex programming problems using LLMs
Enhancing feedback quality with RLHF for novice programmers
Combining LLMs and RLHF to outperform existing repair methods
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses LLMs for code-feedback generation
Applies RLHF for Socratic response fine-tuning
Combines two LLMs with SOTA optimization techniques
🔎 Similar Papers
No similar papers found.
T
Tasnia Rahman
Department of Computer Science, Cleveland State University, Cleveland, OH, USA
S
Sathish A. P. Kumar
Department of Computer Science, Cleveland State University, Cleveland, OH, USA
S
Sumit Jha
School of Computing and Information Sciences, Florida International University, Miami, FL, USA
Arvind Ramanathan
Arvind Ramanathan
Argonne National Laboratory
Machine LearningComputational BiologyMolecular biophysicsenzyme catalysishigher-order statistics