Senior Research Scientist - Machine Learning System

About the job

The Machine Learning (ML) System sub-team combines system engineering and the art of machine learning to develop and maintain massively distributed ML training and Inference system/services around the world, providing high-performance, highly reliable, scalable systems for LLM/AIGC/AGI

Responsibilities

- Responsible for developing and optimizing LLM inference framework.

- Responsible for GPU and CUDA Performance optimization to create an industry-leading high-performance LLM inference engine.

Qualifications

Minimum

- Bachelor's degree or above, major in computer/electronics/automation/software, etc.

- Proficient in C/C++, proficient in algorithms and data structures, familiar with Python

- Understand the basic principles of deep learning algorithms, be familiar with the basic architecture of neural networks and understand deep learning training frameworks such as Pytorch.

Preferred

- Proficient in GPU high-performance computing optimization technology on CUDA, in-depth understanding of computer architecture, familiar with parallel computing optimization, memory access optimization, low-bit computing, etc.

- Familiar with TensorRT-LLM, ORCA, VLLM, etc.

- Knowledge of LLM models, experience in accelerating LLM model optimization is preferred.