LabelBuddy: An Open Source Music and Audio Language Annotation Tagging Tool Using AI Assistance

📅 2026-03-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing open-source audio annotation tools struggle to capture the nuanced subjective differences in human perception of musical semantics, thereby hindering intent alignment between humans and machines in music information retrieval. To address this limitation, this work proposes LabelBuddy—an open-source, collaborative, AI-assisted audio annotation platform. Its key innovation lies in integrating a containerized backend architecture, a multi-user consensus mechanism, and a pluggable model interface that enables flexible integration of custom models—including large audio language models—for pre-annotation. Furthermore, the platform supports dynamic human-AI collaborative labeling through extensible AI agents. LabelBuddy provides a scalable infrastructure for community-driven semantic audio representation learning and iterative model development.

Technology Category

Application Category

📝 Abstract
The advancement of Machine learning (ML), Large Audio Language Models (LALMs), and autonomous AI agents in Music Information Retrieval (MIR) necessitates a shift from static tagging to rich, human-aligned representation learning. However, the scarcity of open-source infrastructure capable of capturing the subjective nuances of audio annotation remains a critical bottleneck. This paper introduces \textbf{LabelBuddy}, an open-source collaborative auto-tagging audio annotation tool designed to bridge the gap between human intent and machine understanding. Unlike static tools, it decouples the interface from inference via containerized backends, allowing users to plug in custom models for AI-assisted pre-annotation. We describe the system architecture, which supports multi-user consensus, containerized model isolation, and a roadmap for extending agents and LALMs. Code available at https://github.com/GiannisProkopiou/gsoc2022-Label-buddy.
Problem

Research questions and friction points this paper is trying to address.

audio annotation
subjective nuance
open-source infrastructure
Music Information Retrieval
human-aligned representation
Innovation

Methods, ideas, or system contributions that make the work stand out.

AI-assisted annotation
containerized backend
collaborative tagging
Large Audio Language Models
open-source audio tool
🔎 Similar Papers
No similar papers found.
I
Ioannis Prokopiou
Athens University of Economics and Business
I
Ioannis Sina
University of Patras
A
Agisilaos Kounelis
University of Patras
P
Pantelis Vikatos
Orfium
Themos Stafylakis
Themos Stafylakis
Assoc. Prof. at Athens Univ. of Economics and Business | Omilia | Archimedes/Athena R.C.
Voice BiometricsSpeaker RecognitionAudiovisual ASRNLPMachine Learning