GPU/AI Application System Software Engineer Intern (System Technologies and Engineering)

About the job

The GPU/AI System Technology and Engineering Team is committed to developing highly optimized OS and system software to support deep learning and high-performance computing (HPC) workloads in large-scale data centers. We focus on delivering core software components for the next generation of AI and HPC platforms, benchmarks, and fine-tuning performance. Our work spans the entire hardware/software stack, from GPU drivers to deep learning frameworks, to ensure peak performance across all layers. By joining this team, you will work with the best engineers and talents in this industry and have a broad opportunity to get in touch with the latest AI application systems and newly emerged technology in computing, networking and storage. You will gain remarkable GPU architecture, system software development and GPU validation experience in the most advanced hardware infrastructure on a massive scale.

Responsibilities

Design and implement performance benchmarks and testing methodologies to evaluate system performance

Develop benchmark tools and performance optimization of AI workloads specifically tailored for large-scale LLM training and inference, as well as High-Performance Computing (HPC).

Develop Python scripts to automate the testing of various benchmark tools.

Collaborate with internal teams to identify system bottleneck, debug and improve performance issues.

Qualifications

Minimum

Must be able to commit to a 12-week full-time work period during Summer 2026

Currently pursuing a Bachelor's, Master's, or PhD degree within Computer Engineering in Electrical Engineering, Computer Engineering, Computer Science or related majors.

Background with GPU/CPU benchmarking

Familiar with ML/DL techniques, algorithms and frameworks like TensorFlow or PyTorch.

Exposure to testing automation for various applications.

Proficiency in Python and C/C++

Hands-on experience with Linux based systems

Ability to work independently and complete projects from beginning to end and in a timely manner

Preferred

Strong background in one of the following fields: High Performance Computing, ML Hardware Acceleration (e.g., GPU/TPU/RDMA) or ML for Systems, and Distributed Storage.

Experience in AI model development, training, evaluation and deployment on Cloud, Cluster or on-premises.

Experience with parallel programming and at least one communication runtime (MPI, NCCL, UCX, NVSHMEM)

Experience with development applications with CUDA programming

Linux kernel development experience, such as networking and device drivers etc.

Familiar with git workflow.