LLM2Fx-Tools: Tool Calling For Music Post-Production

📅 2025-12-01

📈 Citations: 0

✨ Influential: 0

career value

194K/year

🤖 AI Summary

This work addresses the challenge of enabling large language models (LLMs) to comprehend raw audio and autonomously generate executable audio effect chains (Fx-chains) for music post-production. We propose the first multimodal tool-calling framework tailored for audio effect synthesis, integrating audio representations, structured tool interfaces, chain-of-thought (CoT) planning, and autoregressive sequence modeling to achieve end-to-end mapping from input audio to effect types, ordering, and parameters. We introduce LP-Fx, a high-quality, human-annotated dataset for audio effect chaining, and pioneer the application of LLM tool-calling paradigms to audio processing. Experiments demonstrate that our system generates semantically coherent and parameter-plausible Fx-chains; successfully transfers processing characteristics in style-transfer tasks; and achieves strong interpretability and response fidelity, as validated by both human and LLM-based evaluation.

Technology Category

Application Category

📝 Abstract

This paper introduces LLM2Fx-Tools, a multimodal tool-calling framework that generates executable sequences of audio effects (Fx-chain) for music post-production. LLM2Fx-Tools uses a large language model (LLM) to understand audio inputs, select audio effects types, determine their order, and estimate parameters, guided by chain-of-thought (CoT) planning. We also present LP-Fx, a new instruction-following dataset with structured CoT annotations and tool calls for audio effects modules. Experiments show that LLM2Fx-Tools can infer an Fx-chain and its parameters from pairs of unprocessed and processed audio, enabled by autoregressive sequence modeling, tool calling, and CoT reasoning. We further validate the system in a style transfer setting, where audio effects information is transferred from a reference source and applied to new content. Finally, LLM-as-a-judge evaluation demonstrates that our approach generates appropriate CoT reasoning and responses for music production queries. To our knowledge, this is the first work to apply LLM-based tool calling to audio effects modules, enabling interpretable and controllable music production.

Problem

Research questions and friction points this paper is trying to address.

Generates executable audio effect sequences for music post-production

Infers effect chains from unprocessed and processed audio pairs

Transfers audio effect styles from reference to new content

Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM-based tool calling for audio effects modules

Autoregressive sequence modeling with chain-of-thought reasoning

Style transfer of audio effects from reference to new content

🔎 Similar Papers

ToolGen: Unified Tool Retrieval and Calling via Generation