Hierarchical Repository-Level Code Summarization for Business Applications Using Local LLMs

πŸ“… 2025-01-14
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
To address the challenges of repository comprehension and business-semantic gap in large-scale enterprise systems, this paper proposes the first repository-level hierarchical code summarization framework tailored for business applications. Methodologically, it first performs syntax-aware structural analysis via Abstract Syntax Trees (ASTs) and leverages a local large language model (LLM) to generate fine-grained function- and variable-level summaries. Subsequently, hierarchical aggregation yields file- and package-level summaries, enhanced by domain-adaptive prompting that incorporates business contextβ€”e.g., telecom Business Support Systems (BSS)β€”to improve interpretability and relevance. The key contribution is the first integrated modeling paradigm synergizing AST-based syntactic analysis, local LLM inference, and domain-specific prompt engineering, thereby balancing technical precision with business-intent articulation. Evaluation on real-world telecom BSS systems demonstrates a 32% increase in summary coverage and a 41% improvement in human-assessed business relevance, significantly outperforming both single-layer and general-purpose model baselines.

Technology Category

Application Category

πŸ“ Abstract
In large-scale software development, understanding the functionality and intent behind complex codebases is critical for effective development and maintenance. While code summarization has been widely studied, existing methods primarily focus on smaller code units, such as functions, and struggle with larger code artifacts like files and packages. Additionally, current summarization models tend to emphasize low-level implementation details, often overlooking the domain and business context that are crucial for real-world applications. This paper proposes a two-step hierarchical approach for repository-level code summarization, tailored to business applications. First, smaller code units such as functions and variables are identified using syntax analysis and summarized with local LLMs. These summaries are then aggregated to generate higher-level file and package summaries. To ensure the summaries are grounded in business context, we design custom prompts that capture the intended purpose of code artifacts based on the domain and problem context of the business application. We evaluate our approach on a business support system (BSS) for the telecommunications domain, showing that syntax analysis-based hierarchical summarization improves coverage, while business-context grounding enhances the relevance of the generated summaries.
Problem

Research questions and friction points this paper is trying to address.

Code Understanding
Large Software Projects
Business-related Code Summarization
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hierarchical Summarization
Telecom Business Systems
Two-stage Method
πŸ”Ž Similar Papers
No similar papers found.
N
Nilesh Dhulshette
TCS Research, Tata Consultancy Services Ltd., Pune, India
S
Sapan Shah
TCS Research, Tata Consultancy Services Ltd., Pune, India
Vinay Kulkarni
Vinay Kulkarni
Tata Consultancy Services, Pune, India
Digital TwinsAdaptive ArchitectureAI in SEModel Driven Engineering