Senior Machine Learning Engineer II - LLM

Moveworks
Mountain View, CA / Mountain View, CA - HQ, Mountain View, California, United States2025-09-11

About the job

We are looking for a Machine Learning Engineer to help build cutting edge ML infrastructure for building and serving LLM’s at Moveworks. This role will be critical in building, optimizing and scaling end-to-end machine learning systems. The ML infra team covers a variety of responsibilities including distributed training and inference pipeline for large language models (LLM), model evaluation and monitoring framework, LLM latency optimization, etc. These frameworks serve as a strong foundation for our hundreds of ML and NLP models in production serving hundreds of millions of enterprise employees. We are solving many challenges on scalability of services as well as optimization of core algorithms.

Responsibilities

Design, build and optimize scalable machine learning infrastructure to support training, evaluation, and deployment of large language models.

Build abstractions to automate various steps in different ML workflows

Collaborate with cross functional teams of engineers, data analytics, machine learning experts, and product to build new features

Leverage your experience to drive best practices in ML and data engineering

Qualifications

Minimum

2+ years of industry experience in Machine Learning, Infrastructure or related fields

Experience with deep learning framework such as Pytorch or Huggingface or LLM serving frameworks such as vLLM or TensorRT-LLM.

Experience with building and scaling end-to-end machine learning systems

Experience building scalable micro services and ETL pipelines

Expertise in Python and experience with performant language such as C++ or GoLang

Bachelor's in Computer Science, Computer Engineering, Mathematics, or equivalent field.

A love of research publications in the machine learning and software engineering communities

Effective communicator with experience collaborating cross-functionally with other teams

Preferred

Experience with ML Inference optimization using TensorRT.

Experience with distributed training frameworks such as Deepspeed.

Experience in managing and scaling GPU Inference services via Kubernetes