🤖 AI Summary
This study addresses the lack of effective validation methods for semi-formal blueprints in early-stage software product line engineering, which often leads to undetected structural and constraint-related errors in feature models. For the first time, it systematically evaluates the capability of large language models (LLMs) in feature model analysis tasks by leveraging twelve state-of-the-art LLMs and sixteen standard analytical operations that integrate structural parsing with constraint reasoning. Performance is benchmarked against the solver-based tool FLAMA. Results demonstrate that reasoning-optimized models—such as Grok 4 Fast Reasoning and Gemini 2.5 Pro—achieve average accuracies of 88–89%, approaching the performance of formal solvers. These findings substantiate the feasibility and practical potential of LLMs as lightweight, early-stage validation tools for feature model verification.
📝 Abstract
We study whether Large Language Models (LLMs) can perform feature model analysis operations (AOs) directly on semi-formal textual blueprints, i.e., concise constrained-language descriptions of feature hierarchies and constraints, enabling early validation in Software Product Line scoping. Using 12 state-of-the-art LLMs and 16 standard AOs, we compare their outputs against the solver-based oracle FLAMA. Results show that reasoning-optimized models (e.g., Grok 4 Fast Reasoning, Gemini 2.5 Pro) achieve 88-89% average accuracy across all evaluated blueprints and operations, approaching solver correctness. We identify systematic errors in structural parsing and constraint reasoning, and highlight accuracy-cost trade-offs that inform model selection. These findings position LLMs as lightweight assistants for early variability validation.