About the job
We are seeking passionate and innovative engineers to design, build and manage cutting-edge networking infrastructure that powers large-scale AI training. This role focuses on developing next-generation networking capabilities to ensure high performance, low latency, and minimal jitter for distributed AI workloads. You will play a critical role in enabling state-of-the-art AI systems to achieve their full potential.
Responsibilities
Demonstrates some knowledge of data — knows what data is needed, knows how to find new or missing data, and can describe defects and their relevance to product and service targets. Identifies patterns and trends in data and interprets them to inform decisions related to products and/or services.
Collaborates with teams across the organization to support and manage safe and secure network deployments. Works with machine-readable definitions to manage deployments.
Supports the management of incidents by applying technical knowledge to diagnose and triage issues with a commitment to maintaining the quality of products and services. Takes notes during incidents and participates in postmortem and root cause analysis processes.
Performs testing and validation of network devices, firmware, and configurations. Defines and implements test cases with existing automation tools, and exposes test coverage gaps.
Triage, troubleshoot, and repair live site issues by applying an understanding of network components and features (e.g., device operating systems) as well as problem management tools (e.g., root cause analysis, trend analysis, postmortems), to discover and drive solutions with minimal or no disruption to customers. Actively participates in on-call/DRI duties to troubleshoot and may actively resolve incidents in production.
Monitors network telemetry and performs analyses to identify patterns that reveal errors and unexpected problems. Makes suggestions on improvements to monitoring based on observations and experience.
Provides instructions to datacenter or network site staff/technicians on how to securely repair, replace, and maintain physical network hardware and components deployed in production. Identifies gaps and inefficiencies in processes related to securely installing and deploying new hardware and components and provides instructions to address gaps.
Qualifications
Minimum
Master's Degree in Electrical Engineering, Optical Engineering, Computer Science, Information Technology, or related field AND 1+ year(s) technical experience in network design, development, and automation OR Bachelor's Degree in Electrical Engineering, Optical Engineering, Computer Science, Information Technology, or related field AND 2+ years technical experience in network design, development, and automation OR equivalent experience. Other Requirements: Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include, but are not limited to the following spec
Preferred
Doctorate Degree in Electrical Engineering, Optical Engineering, Computer Science, Information Technology, or related field OR Master's Degree in Electrical Engineering, Optical Engineering, Computer Science, Information Technology, or related field AND 3+ years technical experience in network design, development, and automation OR Bachelor's Degree in Electrical Engineering, Optical Engineering, Computer Science, Information Technology, or related field AND 5+ years technical experience in network design, development, and automation OR equivalent experience.