🤖 AI Summary
This work addresses the problem of missing requirement types—i.e., incomplete descriptions—in software requirements specifications by proposing a novel detection method that operates without manual annotations. The approach leverages a pretrained sentence encoder to map requirements into vector representations and integrates three components: geometric coverage, type-constrained distributional coverage, and unsupervised crowd counting. These are unified into a gap-scoring mechanism based on the z-score of k-nearest neighbor distances. Notably, this study is the first to combine geometric coverage with distributional statistics and introduces a tunable hyperparameter-based strategy for multi-component fusion. Evaluated on the PROMISE NFR benchmark, the method achieves an AUROC of 0.935 for projects containing more than 50 requirements, matching the performance of ground-truth counters that rely on human annotations.
📝 Abstract
We propose \geogap{}, a geometric method for detecting missing requirement types in software specifications. The method represents each requirement as a unit vector via a pretrained sentence encoder, then measures coverage deficits through $k$-nearest-neighbour distances z-scored against per-project baselines. Three complementary scoring components -- per-point geometric coverage, type-restricted distributional coverage, and annotation-free population counting -- fuse into a unified gap score controlled by two hyperparameters. On the PROMISE NFR benchmark, \geogap{} achieves 0.935 AUROC for detecting completely absent requirement types in projects with $N \geq 50$ requirements, matching a ground-truth count oracle that requires human annotation. Six baselines confirm that each pipeline component -- per-project normalisation, neural embeddings, and geometric scoring -- contributes measurable value.