Detecting Underspecification in Software Requirements via k-NN Coverage Geometry

📅 2026-03-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the problem of missing requirement types—i.e., incomplete descriptions—in software requirements specifications by proposing a novel detection method that operates without manual annotations. The approach leverages a pretrained sentence encoder to map requirements into vector representations and integrates three components: geometric coverage, type-constrained distributional coverage, and unsupervised crowd counting. These are unified into a gap-scoring mechanism based on the z-score of k-nearest neighbor distances. Notably, this study is the first to combine geometric coverage with distributional statistics and introduces a tunable hyperparameter-based strategy for multi-component fusion. Evaluated on the PROMISE NFR benchmark, the method achieves an AUROC of 0.935 for projects containing more than 50 requirements, matching the performance of ground-truth counters that rely on human annotations.

Technology Category

Application Category

📝 Abstract
We propose \geogap{}, a geometric method for detecting missing requirement types in software specifications. The method represents each requirement as a unit vector via a pretrained sentence encoder, then measures coverage deficits through $k$-nearest-neighbour distances z-scored against per-project baselines. Three complementary scoring components -- per-point geometric coverage, type-restricted distributional coverage, and annotation-free population counting -- fuse into a unified gap score controlled by two hyperparameters. On the PROMISE NFR benchmark, \geogap{} achieves 0.935 AUROC for detecting completely absent requirement types in projects with $N \geq 50$ requirements, matching a ground-truth count oracle that requires human annotation. Six baselines confirm that each pipeline component -- per-project normalisation, neural embeddings, and geometric scoring -- contributes measurable value.
Problem

Research questions and friction points this paper is trying to address.

underspecification
software requirements
missing requirement types
requirement coverage
NFR
Innovation

Methods, ideas, or system contributions that make the work stand out.

geometric coverage
k-NN distance
requirement underspecification
neural sentence embeddings
unsupervised gap detection
Wenyan Yang
Wenyan Yang
Aalto University
Computer VisionImitation LearningReinforcement Learning
T
Tomáš Janovec
Tampere University
S
Samantha Bavautdin
Tampere University