Detecting Underspecification in Software Requirements via k-NN Coverage Geometry

📅 2026-03-25

📈 Citations: 0

✨ Influential: 0

career value

129K/year

🤖 AI Summary

This work addresses the problem of missing requirement types—i.e., incomplete descriptions—in software requirements specifications by proposing a novel detection method that operates without manual annotations. The approach leverages a pretrained sentence encoder to map requirements into vector representations and integrates three components: geometric coverage, type-constrained distributional coverage, and unsupervised crowd counting. These are unified into a gap-scoring mechanism based on the z-score of k-nearest neighbor distances. Notably, this study is the first to combine geometric coverage with distributional statistics and introduces a tunable hyperparameter-based strategy for multi-component fusion. Evaluated on the PROMISE NFR benchmark, the method achieves an AUROC of 0.935 for projects containing more than 50 requirements, matching the performance of ground-truth counters that rely on human annotations.

Technology Category

Application Category

📝 Abstract

We propose \geogap{}, a geometric method for detecting missing requirement types in software specifications. The method represents each requirement as a unit vector via a pretrained sentence encoder, then measures coverage deficits through $k$-nearest-neighbour distances z-scored against per-project baselines. Three complementary scoring components -- per-point geometric coverage, type-restricted distributional coverage, and annotation-free population counting -- fuse into a unified gap score controlled by two hyperparameters. On the PROMISE NFR benchmark, \geogap{} achieves 0.935 AUROC for detecting completely absent requirement types in projects with $N \geq 50$ requirements, matching a ground-truth count oracle that requires human annotation. Six baselines confirm that each pipeline component -- per-project normalisation, neural embeddings, and geometric scoring -- contributes measurable value.

Problem

Research questions and friction points this paper is trying to address.

underspecification

software requirements

missing requirement types

requirement coverage

NFR

Innovation

Methods, ideas, or system contributions that make the work stand out.

geometric coverage

k-NN distance

requirement underspecification