Risk Assessment Framework for Code LLMs via Leveraging Internal States

📅 2025-04-20

📈 Citations: 0

✨ Influential: 0

career value

205K/year

🤖 AI Summary

Existing large language models (LLMs) for code generation suffer from unreliability, security vulnerabilities, and erroneous outputs, exacerbated by the absence of fine-grained, cross-lingual, and deployable risk assessment mechanisms. To address this, we propose PtTrust—a novel framework that enables code-level risk awareness by directly leveraging internal representations of pre-trained LLMs. PtTrust introduces a two-stage, end-to-end, industrially deployable assessment paradigm: (1) unsupervised state representation pre-training, followed by (2) few-shot fine-tuning. It supports multi-language code modeling and interpretable feature extraction. Extensive experiments demonstrate that PtTrust achieves high-precision, line-level risk identification across diverse programming languages, exhibits strong generalization to unseen code patterns and vulnerabilities, and provides human-interpretable attribution evidence. By delivering actionable, transparent risk signals, PtTrust significantly enhances the trustworthiness of generated code and strengthens developer confidence in LLM-assisted coding.

Technology Category

Application Category

📝 Abstract

The pre-training paradigm plays a key role in the success of Large Language Models (LLMs), which have been recognized as one of the most significant advancements of AI recently. Building on these breakthroughs, code LLMs with advanced coding capabilities bring huge impacts on software engineering, showing the tendency to become an essential part of developers' daily routines. However, the current code LLMs still face serious challenges related to trustworthiness, as they can generate incorrect, insecure, or unreliable code. Recent exploratory studies find that it can be promising to detect such risky outputs by analyzing LLMs' internal states, akin to how the human brain unconsciously recognizes its own mistakes. Yet, most of these approaches are limited to narrow sub-domains of LLM operations and fall short of achieving industry-level scalability and practicability. To address these challenges, in this paper, we propose PtTrust, a two-stage risk assessment framework for code LLM based on internal state pre-training, designed to integrate seamlessly with the existing infrastructure of software companies. The core idea is that the risk assessment framework could also undergo a pre-training process similar to LLMs. Specifically, PtTrust first performs unsupervised pre-training on large-scale unlabeled source code to learn general representations of LLM states. Then, it uses a small, labeled dataset to train a risk predictor. We demonstrate the effectiveness of PtTrust through fine-grained, code line-level risk assessment and demonstrate that it generalizes across tasks and different programming languages. Further experiments also reveal that PtTrust provides highly intuitive and interpretable features, fostering greater user trust. We believe PtTrust makes a promising step toward scalable and trustworthy assurance for code LLMs.

Problem

Research questions and friction points this paper is trying to address.

Assessing risks in code LLMs via internal states

Improving trustworthiness of code generation outputs

Scaling risk detection for industry-level practicality

Innovation

Methods, ideas, or system contributions that make the work stand out.

Two-stage risk assessment framework for code LLMs

Internal state pre-training for risk detection

Generalizes across tasks and programming languages

🔎 Similar Papers

How Well Do Large Language Models Serve as End-to-End Secure Code Producers?