AutoFL: A Tool for Automatic Multi-granular Labelling of Software Repositories

📅 2024-08-05

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

145K/year

🤖 AI Summary

Software developers face inefficient and time-consuming challenges in comprehending large, multifunctional codebases; existing README-based, coarse-grained project-level categorization fails to support fine-grained functional understanding. To address this, we propose AutoFL—the first automated, cross-granularity functional domain labeling method supporting file-, package-, and project-level annotations without relying on non-code documentation (e.g., READMEs). AutoFL directly models source code semantics via a weakly supervised learning framework that integrates code text parsing, multi-granularity semantic embedding, and hierarchical aggregation for end-to-end label generation. Evaluated across multilingual open-source projects, AutoFL significantly improves the accuracy, consistency, and interpretability of functional labels compared to baselines. It effectively alleviates key bottlenecks in software comprehension by enabling precise, scalable, and documentation-agnostic functional awareness.

Technology Category

Application Category

📝 Abstract

Software comprehension, especially of new code bases, is time consuming for developers, especially in large projects with multiple functionalities spanning various domains. One strategy to reduce this effort involves annotating files with meaningful labels that describe the functionalities contained. However, prior research has so far focused on classifying the whole project using README files as a proxy, resulting in little information gained for the developers. Our objective is to streamline the labelling of files with the correct application domains using source code as input. To achieve this, in prior work, we evaluated the ability to annotate files automatically using a weak labelling approach. This paper presents AutoFL, a tool for automatically labelling software repositories from source code. AutoFL allows multi-granular annotations including: extit{file}, extit{package}, and extit{project} -level. We provide an overview of the tool's internals, present an example analysis for which AutoFL can be used, and discuss limitations and future work.

Problem

Research questions and friction points this paper is trying to address.

Automate labeling software repositories for faster code comprehension

Enable multi-granular annotations at file, package, and project levels

Improve domain-specific file classification using source code input

Innovation

Methods, ideas, or system contributions that make the work stand out.

Automated multi-granular labelling of repositories

Uses source code for domain-specific annotations

Supports file, package, and project-level granularity

🔎 Similar Papers

An Approach for Auto Generation of Labeling Functions for Software Engineering Chatbots