Member of Technical Staff - ML Performance

Modal
NYC, SF, Stockholm2026-04-21

About the job

We are looking for strong engineers with experience in making ML systems performant at scale. If you are interested in contributing to open-source projects and Modal’s container runtime to push language and diffusion models towards higher throughput and lower latency, we’d love to hear from you!

Responsibilities

No responsibilities listed.

Qualifications

Minimum

- 5+ years of experience writing high-quality, high-performance code.

- Experience working with torch, high-level ML frameworks, and inference engines (vLLM or TensorRT).

- Familiarity with Nvidia GPU architecture and CUDA.

- Experience with ML performance engineering (tell us a story about boosting GPU performance — debugging SM occupancy issues, rewriting an algorithm to be compute-bound, eliminating host overhead, etc).

Preferred

- Nice-to-have: familiarity with low-level operating system foundations (Linux kernel, file systems, containers, etc).