🤖 AI Summary
Architectural knowledge in developer communities (e.g., Stack Overflow) is highly unstructured and fragmented, rendering manual identification of architecture-related problems and solutions inefficient and error-prone.
Method: This paper proposes ArchISMiner, the first end-to-end framework for automatically mining architecture problem–solution pairs. It integrates PLM/LLM-based semantic representations, BERT embeddings, and TextCNN-captured local features, combined with optimal model selection and indirect supervision to jointly enhance architecture-related post identification (ARP) and fine-grained problem/solution extraction.
Contribution/Results: ArchISMiner achieves F1-scores of 0.960 for ARP identification, and 0.883 and 0.894 for problem and solution extraction, respectively. Deployed across multiple forums, it generates over 18,000 high-quality architecture problem–solution pairs, significantly advancing architectural knowledge graph construction and practical reuse.
📝 Abstract
Stack Overflow (SO), a leading online community forum, is a rich source of software development knowledge. However, locating architectural knowledge, such as architectural solutions remains challenging due to the overwhelming volume of unstructured content and fragmented discussions. Developers must manually sift through posts to find relevant architectural insights, which is time-consuming and error-prone. This study introduces ArchISMiner, a framework for mining architectural knowledge from SO. The framework comprises two complementary components: ArchPI and ArchISPE. ArchPI trains and evaluates multiple models, including conventional ML/DL models, Pre-trained Language Models (PLMs), and Large Language Models (LLMs), and selects the best-performing model to automatically identify Architecture-Related Posts (ARPs) among programming-related discussions. ArchISPE employs an indirect supervised approach that leverages diverse features, including BERT embeddings and local TextCNN features, to extract architectural issue-solution pairs. Our evaluation shows that the best model in ArchPI achieves an F1-score of 0.960 in ARP detection, and ArchISPE outperforms baselines in both SE and NLP fields, achieving F1-scores of 0.883 for architectural issues and 0.894 for solutions. A user study further validated the quality (e.g., relevance and usefulness) of the identified ARPs and the extracted issue-solution pairs. Moreover, we applied ArchISMiner to three additional forums, releasing a dataset of over 18K architectural issue-solution pairs. Overall, ArchISMiner can help architects and developers identify ARPs and extract succinct, relevant, and useful architectural knowledge from developer communities more accurately and efficiently. The replication package of this study has been provided at https://github.com/JeanMusenga/ArchISPE