🤖 AI Summary
Existing automatic guitar transcription methods suffer from two critical limitations: (1) failure to recognize essential performance techniques—such as slides, bends, and percussive hits—and (2) frequent misalignment in note-to-string/fret mapping, rendering generated scores unplayable. This paper introduces the first end-to-end, four-stage system that fully automates the generation of playable, technique-annotated guitar tablature with precise fingering from raw audio. Our approach integrates an audio-to-MIDI converter, a dedicated guitar-technique classifier based on MLPs, a Transformer-based string/fret assignment module, and an LSTM-driven tablature generation component. Crucially, we transfer prior knowledge from piano transcription while adapting to guitar-specific acoustics and playing constraints. Experiments on real-world recordings demonstrate significant improvements in technique detection accuracy and fingering plausibility. Quantitative and qualitative evaluations confirm that our system surpasses state-of-the-art methods in both technique annotation completeness and score playability.
📝 Abstract
Automatic Music Transcription (AMT) has advanced significantly for the piano, but transcription for the guitar remains limited due to several key challenges. Existing systems fail to detect and annotate expressive techniques (e.g., slides, bends, percussive hits) and incorrectly map notes to the wrong string and fret combination in the generated tablature. Furthermore, prior models are typically trained on small, isolated datasets, limiting their generalizability to real-world guitar recordings. To overcome these limitations, we propose a four-stage end-to-end pipeline that produces detailed guitar tablature directly from audio. Our system consists of (1) Audio-to-MIDI pitch conversion through a piano transcription model adapted to guitar datasets; (2) MLP-based expressive technique classification; (3) Transformer-based string and fret assignment; and (4) LSTM-based tablature generation. To the best of our knowledge, this framework is the first to generate detailed tablature with accurate fingerings and expressive labels from guitar audio.