A Needle in a Haystack: Intent-driven Reusable Artifacts Recommendation with LLMs

📅 2025-11-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In open-source development, developers struggle to accurately identify functionally suitable reusable components from vast repositories. To address this, we propose TreeRec, an intent-driven hierarchical recommendation framework that introduces feature trees into large language model (LLM)-based recommendation for the first time. TreeRec constructs a semantically abstract, ontology-inspired hierarchical feature tree and employs an intent–function alignment mechanism to enable fine-grained requirement understanding and multi-level semantic reasoning. This design substantially reduces LLM inference overhead while improving semantic matching accuracy. Extensive experiments across three major open-source ecosystems demonstrate that TreeRec consistently enhances recommendation precision—averaging significant gains across multiple mainstream LLMs—while reducing inference latency. Moreover, it exhibits strong generalizability and practical feasibility for real-world deployment.

Technology Category

Application Category

📝 Abstract
In open source software development, the reuse of existing artifacts has been widely adopted to avoid redundant implementation work. Reusable artifacts are considered more efficient and reliable than developing software components from scratch. However, when faced with a large number of reusable artifacts, developers often struggle to find artifacts that can meet their expected needs. To reduce this burden, retrieval-based and learning-based techniques have been proposed to automate artifact recommendations. Recently, Large Language Models (LLMs) have shown the potential to understand intentions, perform semantic alignment, and recommend usable artifacts. Nevertheless, their effectiveness has not been thoroughly explored. To fill this gap, we construct an intent-driven artifact recommendation benchmark named IntentRecBench, covering three representative open source ecosystems. Using IntentRecBench, we conduct a comprehensive comparative study of five popular LLMs and six traditional approaches in terms of precision and efficiency. Our results show that although LLMs outperform traditional methods, they still suffer from low precision and high inference cost due to the large candidate space. Inspired by the ontology-based semantic organization in software engineering, we propose TreeRec, a feature tree-guided recommendation framework to mitigate these issues. TreeRec leverages LLM-based semantic abstraction to organize artifacts into a hierarchical semantic tree, enabling intent and function alignment and reducing reasoning time. Extensive experiments demonstrate that TreeRec consistently improves the performance of diverse LLMs across ecosystems, highlighting its generalizability and potential for practical deployment.
Problem

Research questions and friction points this paper is trying to address.

Finding reusable software artifacts that match developer intentions from large collections
Addressing low precision and high computational costs in LLM-based artifact recommendations
Improving intent-function alignment and reasoning efficiency in artifact recommendation systems
Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM-based semantic abstraction organizes artifacts hierarchically
TreeRec framework reduces reasoning time and improves precision
Feature tree-guided recommendation aligns intents with functions
🔎 Similar Papers
No similar papers found.
Dongming Jin
Dongming Jin
Peking University
Requirments EngineeringLarge Language Models
Zhi Jin
Zhi Jin
Sun Yat-Sen University, Associate Professor
X
Xiaohong Chen
Shanghai Key Laboratory of Trustworthy Computing, East China Normal University, China
Z
Zheng Fang
Key Laboratory of High Confidence Software Technologies (Peking University), Ministry of Education; School of Computer Science, Peking University, Beijing, China
Linyu Li
Linyu Li
Peking University
knowledge graphai4science
Y
Yuanpeng He
Key Laboratory of High Confidence Software Technologies (Peking University), Ministry of Education; School of Computer Science, Peking University, Beijing, China
J
Jia Li
School of Computer Science, Wuhan University, Wuhan, China
Y
Yirang Zhang
Nanyang Technological University, Singapore
Y
Yingtao Fang
School of Computer Science, Wuhan University, Wuhan, China