Member of Technical Staff - ML Performance

About the job

We are looking for strong engineers with experience in making ML systems performant at scale. If you are interested in contributing to open-source projects and Modal’s container runtime to push language and diffusion models towards higher throughput and lower latency, we’d love to hear from you!

Responsibilities

- Contributing to open-source projects

- Working on Modal’s container runtime

- Pushing language and diffusion models towards higher throughput and lower latency

Qualifications

Minimum

- 5+ years of experience writing high-quality, high-performance code.

- Experience working with torch, high-level ML frameworks, and inference engines (vLLM or TensorRT).

- Familiarity with Nvidia GPU architecture and CUDA.

- Experience with ML performance engineering (tell us a story about boosting GPU performance — debugging SM occupancy issues, rewriting an algorithm to be compute-bound, eliminating host overhead, etc).

Preferred

- Familiarity with low-level operating system foundations (Linux kernel, file systems, containers, etc).