About the job
We are looking for strong engineers with experience in making ML systems performant at scale. If you are interested in contributing to open-source projects and Modal’s container runtime to push language and diffusion models towards higher throughput and lower latency, we’d love to hear from you!
Responsibilities
No responsibilities listed.
Qualifications
Minimum
- 5+ years of experience writing high-quality, high-performance code.
- Experience working with torch, high-level ML frameworks, and inference engines (vLLM or TensorRT).
- Familiarity with Nvidia GPU architecture and CUDA.
- Experience with ML performance engineering (tell us a story about boosting GPU performance — debugging SM occupancy issues, rewriting an algorithm to be compute-bound, eliminating host overhead, etc).
Preferred
- Nice-to-have: familiarity with low-level operating system foundations (Linux kernel, file systems, containers, etc).