🤖 AI Summary
To address limited expressivity and poor controllability in electric guitar audio synthesis, this paper proposes a two-stage tablature-driven framework. First, a sampling-based virtual instrument renders tablature into coarse-grained audio; second, flow matching enables tablature-conditioned timbre transfer—marking the first application of flow matching to guitar timbre conversion. The method explicitly models expressive techniques such as slides and palm mutes, requiring less than six hours of high-quality training data. Objective metrics (e.g., MCD, F0 RMSE) and subjective listening tests (MOS, ABX) demonstrate that the synthesized audio significantly outperforms state-of-the-art methods in realism and articulation fidelity. Crucially, the approach achieves both high musical expressivity and training efficiency, enabling precise control over performance nuances while maintaining computational tractability.
📝 Abstract
Music generation in the audio domain using artificial intelligence (AI) has witnessed steady progress in recent years. However for some instruments, particularly the guitar, controllable instrument synthesis remains limited in expressivity. We introduce GuitarFlow, a model designed specifically for electric guitar synthesis. The generative process is guided using tablatures, an ubiquitous and intuitive guitar-specific symbolic format. The tablature format easily represents guitar-specific playing techniques (e.g. bends, muted strings and legatos), which are more difficult to represent in other common music notation formats such as MIDI. Our model relies on an intermediary step of first rendering the tablature to audio using a simple sample-based virtual instrument, then performing style transfer using Flow Matching in order to transform the virtual instrument audio into more realistic sounding examples. This results in a model that is quick to train and to perform inference, requiring less than 6 hours of training data. We present the results of objective evaluation metrics, together with a listening test, in which we show significant improvement in the realism of the generated guitar audio from tablatures.