Verifier Warnings Do Not Improve Comprehensibility Prediction

📅 2026-04-24

📈 Citations: 0

✨ Influential: 0

career value

167K/year

🤖 AI Summary

This study investigates whether formal verifier warnings can serve as effective semantic features to enhance the performance of code understandability prediction models. Through a controlled treatment-control experiment, the total number of verifier warnings is incorporated into a machine learning model and compared against a baseline model that relies solely on syntactic and developer-related features. This work presents the first empirical evaluation of the predictive added value of verifier warnings for human-judged code understandability. The results indicate that this feature does not yield a statistically significant performance improvement, thereby challenging the common assumption that verifier warnings constitute a highly discriminative semantic signal. These findings offer new empirical evidence for the selection of semantic features in modeling code understandability.

Technology Category

Application Category

📝 Abstract

Proponents of software verification suggest that code simplicity is linked to the effort to verify code, hypothesizing that formal verifiers produce fewer false positive warnings and require less manual intervention when analyzing simpler code. A recent meta-analysis study found empirical support for this hypothesis: a small correlation between the sum of verifier warnings and human-derived code comprehensibility metrics. Based on this finding, we conjectured that using the sum of verifier tool (verifier) warnings to represent program semantic information as an input feature to machine learning (ML) models for code comprehensibility prediction can enhance their performance, when combined with traditional syntactic and developer features. To test this conjecture, we performed a control-treatment experiment incorporating the verifier warning sum feature into machine learning models from the literature, and conducted a comparative analysis of their performance against models trained only on syntactic and developer features. We found no significant difference in the prediction performance of models with and without the warnings feature. Our findings suggest that while a correlation exists, the verifier warning sum offers limited discriminative power: combining syntactic and developer features is just as effective for predicting human-judged code comprehensibility.

Problem

Research questions and friction points this paper is trying to address.

code comprehensibility

verifier warnings

machine learning

program semantics

software verification

Innovation

Methods, ideas, or system contributions that make the work stand out.

verifier warnings

code comprehensibility

machine learning