About the job
This role sits at the intersection of frontier audio models and real-world deployment. You'll own the applied post-training work that adapts LFM2.5-Audio for customer use cases end-to-end, from data generation through delivery. Unlike most roles that force a trade-off between customer impact and foundational work, this one gives you both: deep ownership over how audio models are adapted, evaluated, and shipped, and a direct line into the evolution of Liquid's post-training and audio stacks.
Responsibilities
Act as the technical owner for enterprise audio post-training engagements.
Translate customer requirements into concrete post-training specifications and workflows for LFM2.5-Audio and future audio models.
Design and build function calling capabilities for audio models: training models to map spoken user intents to structured tool calls (API invocations, parameter extraction, confirmation flows).
Design and execute data generation pipelines for speech-to-speech and text-to-text training, including synthetic dialogue, function calling examples, and intent-action pairs.
Run supervised fine-tuning, preference alignment, and reinforcement learning workflows on audio language models.
Design task-specific evaluations for audio function calling (intent recognition accuracy, parameter extraction, end-to-end task completion) and feed learnings back into core post-training pipelines.
Qualifications
Minimum
Hands-on experience with post-training for language models (SFT, preference alignment, and/or RL).
Experience with data generation and evaluation pipelines for LLM or audio model training.
Strong intuition for data quality and evaluation design.
Familiarity with function calling, tool use, or structured output training for language models.
Preferred
Experience with speech or audio language models (speech-to-speech, ASR, TTS, or multimodal audio-text systems).
Prior exposure to customer-facing or applied ML delivery environments.
Experience with alignment or RL techniques beyond basic supervised fine-tuning.
Familiarity with on-device or low-latency inference constraints.