About the job
Our Audio team is building frontier speech-language models that handle STT, TTS, and speech-to-speech in a single architecture. This role sits at the center of applied audio model development, working directly with the technical lead to ship production systems that run on-device under real-time constraints. You will own critical workstreams across data pipelines, evaluation systems, and customer deployments. If you want high ownership on rare technical problems in a small, elite team where your code ships, this is the role.
Responsibilities
Build and scale data pipelines for audio model training, including preprocessing, augmentation, and quality filtering at scale
Design, implement, and maintain evaluation systems that measure multimodal performance across internal and public benchmarks
Fine-tune and adapt audio models for customer-specific use cases, owning delivery from requirements through deployment
Contribute production code to the core audio repository, collaborating with infrastructure and research teams
Support experimentation under real hardware constraints, shifting between customer work and core development as priorities evolve
Qualifications
Minimum
Strong programming fundamentals with demonstrated ability to write clean, maintainable, production-grade code
Experience building and shipping production ML systems beyond model training (data pipelines, evals, serving infrastructure)
Proficiency in PyTorch and familiarity with distributed training frameworks (DeepSpeed, FSDP, or similar)
Track record of collaborating effectively in shared codebases with high engineering standards
Preferred
Direct experience with audio/speech models (ASR, TTS, vocoders, diarization, or speech-to-speech systems)
Experience designing and running large-scale training experiments on distributed GPU clusters
Open-source contributions that demonstrate code quality and engineering judgment