AI Infrastructure Engineer · Zoom Video Communications Inc.

About the job

We are seeking an experienced AI Infrastructure Engineer to join our AI Incubation team. You will be focused on building and optimizing large-scale training infrastructure for Large Language Models (LLMs). The ideal candidate will combine engineering fundamentals with practical experience in AI infrastructure development, demonstrating both technical depth and the ability to deliver scalable solutions for complex AI systems.

Responsibilities

Designing and developing scalable AI infrastructure solutions for training and deploying large language models

Building and optimizing distributed training platforms using cutting-edge technologies

Implementing and maintaining containerized AI environments using Docker and Kubernetes

Optimizing CUDA kernels for maximum GPU utilization and performance

Developing platform software to support AI/ML workflows

Collaborating with AI researchers to implement efficient training and inference pipelines

Qualifications

Minimum

Have a bachelor's degree in Computer Science, Engineering, AI, Machine Learning, Distributed System or related field

5+ years of software engineering experience with focus on infrastructure and systems

Have expertise in GPU programming and CUDA optimization

Have experience with container technologies (Docker, Kubernetes), distributed systems and cloud computing

Demonstrate experience building large-scale distributed systems and optimizing neural network performance

Possess programming skills in Python, C++, and CUDA, with deep learning frameworks (PyTorch, Transformers)

Preferred

No preferred qualifications listed.