Uncertainty Quantification for LLM-based Code Generation

📅 2026-05-12
📈 Citations: 0
Influential: 0
📄 PDF

career value

194K/year
🤖 AI Summary
This work addresses the challenge of quantifying uncertainty in large language model (LLM)-based code generation, where existing methods are hindered by monotonicity assumptions and single-label output frameworks, rendering them ill-suited for handling the inherent multiplicity of correct code solutions. To overcome these limitations, we propose RisCoSet, a novel approach that introduces multiple hypothesis testing into LLM code generation to construct risk-controlled prediction sets. These sets are expressed as partial programs and guarantee high-confidence coverage of all valid solutions. RisCoSet eliminates the reliance on monotonicity and single-output constraints characteristic of conventional conformal prediction methods, thereby supporting scenarios with multiple correct outputs. Experiments across three mainstream LLMs demonstrate that, at comparable risk levels, our method reduces code deletion by up to 24.5% relative to the current state-of-the-art.
📝 Abstract
Prediction sets provide a theoretically grounded framework for quantifying uncertainty in machine learning models. Adapting them to structured generation tasks, in particular, large language model (LLM) based code generation, remains a challenging problem. An existing attempt proposes PAC prediction sets but is limited by its strong monotonicity assumption on risk and single-label classification framework, which severely limits the space of candidate programs and cannot accommodate the multiple valid outputs inherent to code generation. To address these limitations, we propose an approach RisCoSet that leverages multiple hypothesis testing to construct risk-controlling predictions for LLM-based code generation. Given a trained code generation model, we produce a prediction set represented by a partial program, which is guaranteed to contain a correct solution with high confidence. Extensive experiments on three LLMs demonstrate the effectiveness of the proposed method. For instance, compared with the state-of-the-art, our method can significantly reduce the code removal by up to 24.5%, at the same level of risk.
Problem

Research questions and friction points this paper is trying to address.

Uncertainty Quantification
LLM-based Code Generation
Prediction Sets
Multiple Valid Outputs
Risk Control
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uncertainty Quantification
Prediction Sets
LLM-based Code Generation
Multiple Hypothesis Testing
Risk Control