About the job
Running machine learning (ML) algorithms at our scale often requires solving novel systems problems. As a Performance Engineer, you'll be responsible for identifying these problems, and then developing systems that optimize the throughput and robustness of our largest distributed systems. Strong candidates here will have a track record of solving large-scale systems problems and will be excited to grow to become an expert in ML also.
Responsibilities
Implement low-latency high-throughput sampling for large language models
Implement GPU kernels to adapt our models to low-precision inference
Write a custom load-balancing algorithm to optimize serving efficiency
Build quantitative models of system performance
Design and implement a fault-tolerant distributed system running with a complex network topology
Debug kernel-level network latency spikes in a containerized environment
Qualifications
Minimum
Have significant software engineering or machine learning experience, particularly at supercomputing scale
Are results-oriented, with a bias towards flexibility and impact
Pick up slack, even if it goes outside your job description
Enjoy pair programming (we love to pair!)
Want to learn more about machine learning research
Care about the societal impacts of your work
Preferred
High performance, large-scale ML systems
GPU/Accelerator programming
ML framework internals
OS internals
Language modeling with transformers