About the job
We are seeking a highly skilled systems engineer to design data center rack layouts for AI/HPC clusters. This role involves evaluating and selecting rack-level compute, storage, networking, power delivery, and cooling solutions to optimize performance and reliability across global deployments. You will collaborate with cross-functional teams to deliver cutting-edge infrastructure for AI and high-performance computing workloads.
Responsibilities
Rack & Cluster DesignDesign scalable AI/HPC data center rack layouts including compute, storage, networking, power delivery and coolingTranslate high level requirements and architecture input into detailed rack designsDesign leading-edge thermal and power delivery for high-density deployments
NetworkDesign intra-rack and rack-to-rack network connections to maximize overall cluster performanceTranslate network architecture requirements and input into rack-level designsUnderstand the network performance needs of different types of workloads Understand advantages and performance trade-offs of network topologies for AI/HPC clusters
PowerDesign rack-level and data center power delivery infrastructureDefine power budgets, redundancy schemes, and fault tolerance mechanismsUnderstand differences in power delivery and regulatory requirements in global locations, e.g. U.S., EMEA, Asia and other countries
Storage Translate storage architecture requirements and input into rack-level designsDesign and optimize storage solutions to maximize AI/HPC cluster performanceUnderstand advantages and performance trade-offs of cluster storage solutions, e.g. Lustre, Ceph, etc.
CollaborationWork across multiple organizations with subject matter experts from hardware, software, network, data center, and operations teams to deliver scalable, efficient, and reliable compute infrastructure.
Qualifications
Minimum
No minimum qualifications listed.
Preferred
Extensive experience in HPC, AI infrastructure, or data center systems engineeringExperience with liquid cooling or advanced thermal managementExperience with rack level power distributionContributions to open-source HPC or AI infrastructure projects