Research Engineer/Research Scientist, Audio

About the job

As a researcher on the Audio team, you'll work across the full stack of audio ML, developing audio codecs and representations, sourcing and synthesizing high quality audio data, training large-scale speech language models and large audio diffusion models, and developing novel architectures for incorporating continuous signals into LLMs.

Responsibilities

Developing audio codecs and representations; sourcing and synthesizing high quality audio data; training large-scale speech language models and large audio diffusion models; developing novel architectures for incorporating continuous signals into LLMs; working closely with teams across pretraining, finetuning, reinforcement learning, production inference, and product to get advanced audio technologies from early research to high impact real-world deployments.

Qualifications

Minimum

Have hands-on experience with training audio models, whether that's conversational speech-to-speech, speech translation, speech recognition, text-to-speech, diarization, codecs, or generative audio models; genuinely enjoy both research and engineering work, and you'd describe your ideal split as roughly 50/50 rather than heavily weighted toward one or the other; are comfortable working across abstraction levels, from signal processing fundamentals to large-scale model training and inference optimization; have deep expertise with JAX, PyTorch, or large-scale distributed training, and can debug performance issues across the full stack; thrive in fast-moving environments where the most important problem might shift as we learn more about what works; communicate clearly and collaborate effectively; are passionate about building conversational AI that feels natural, steerable, and safe; care about the societal impacts of voice AI and want to help shape how these systems are developed responsibly.

Preferred

Large language model pretraining and finetuning; training diffusion models for image and audio generation; reinforcement learning for large language models and diffusion models; end-to-end system optimization, from performance benchmarking to kernel optimization; GPUs, Kubernetes, PyTorch, or distributed training infrastructure.