AI Cluster & Data Center Design Engineer

AMD
Austin, Texas, United States2026-03-19LAT_LNG

About the job

We are seeking a highly skilled systems engineer to architect and design scalable AI/HPC clusters with specific focus on rack and data center power delivery. This role involves evaluating and selecting compute, storage, networking, and power delivery components and solutions to optimize performance and reliability across global deployments. You will collaborate with cross-functional teams to deliver cutting-edge infrastructure for AI and high-performance computing workloads.

Responsibilities

Design scalable AI/HPC clusters including compute, storage, and networking with specific focus on , power delivery Evaluate and select CPUs, GPUs, accelerators, interconnects, and memory configurations for optimal cluster performance. Design leading-edge power delivery solutions for high-density AI/GPU deployments.Define power budgets, redundancy schemes, and fault tolerance mechanisms. Design network topologies to maximize overall cluster performanceUnderstand the network performance needs of different types of workloadsUnderstand advantages and performance trade-offs of network topologies for AI/HPC clusters Design and optimize storage solutions to maximize AI/HPC cluster performanceUnderstand advantages and performance trade-offs of cluster storage solutions, e.g. Lustre, Ceph, etc. Work across multiple organizations with subject matter experts from hardware, software, network, data center, and operations teams to deliver scalable, efficient, and reliable compute infrastructure.

Qualifications

Minimum

Experience in HPC, AI infrastructure, or data center systems engineering.Strong understanding of rack and data center power deliveryKnowledge of GPU/CPU architectures, PCIe, UALink, InfiniBand, and Ethernet networking.Familiarity with AI/ML frameworks and workload characteristics.Excellent problem-solving, communication, and documentation skills.

Preferred

Experience in HPC, AI infrastructure, or data center systems engineering.Experience designing power delivery solutions for racks and data centersContributions to open-source HPC or AI infrastructure projects.