About the job
We are seeking a highly skilled Principal AI/ML Engineer to join our dynamic team to build the next generation of IT Networking space and help lead the team through a major technology transformation into running AI on-prem and build infrastructure by integrating Enterprise ready platforms while building a solid foundation with automation. We are looking for a passionate engineer who will solve networking problems with AI.
Responsibilities
Lead architecture, design, and implementation of network automation platforms across datacenter, cloud, campus, and enterprise environments.
Build source-of-truth–driven automation workflows using in-house platforms and authoritative network data models.
Design and maintain scalable data models for sites, fabrics, roles, interfaces, addressing, and deployment intent.
Generate intent-based deployment artifacts (cutsheets, cable matrices, rack elevations, port maps, deployment docs) from network models.
Build configuration generation pipelines using templating + IaC patterns to render device/service configs from model data.
Develop multi-vendor provisioning, onboarding, and ZTP workflows for network platforms and services.
Create automated validation/health-check tooling (pre/post checks, compliance, readiness) and integrate with CI/CD and ops systems.
Collaborate cross-functionally and provide technical leadership by setting standards (reliability, security, testability, docs) and mentoring engineers.
Qualifications
Minimum
Bachelor’s degree (or equivalent experience) in Computer Science, Computer Engineering, Electrical Engineering, Information Systems, or related field.
15+ years in network/infrastructure engineering, including 7+ years building production-grade network automation.
Strong software engineering skills in Python and Golang (required); YAML, Bash, JavaScript experience is a plus.
Proven ability to design and deliver large-scale network automation using IaC and API-driven approaches.
Hands-on experience with DCIM/IPAM / Source-of-Truth platforms (e.g., Nautobot/NetBox), including data modeling and API integration.
Experience building config generation pipelines using templating/automation frameworks (e.g., Jinja2, Ansible).
Strong experience with Terraform/Ansible (or similar), including reusable modules, versioned workflows, and pipeline integration.
Deep understanding of datacenter networking fundamentals: TCP/IP, switching/routing, BGP, EVPN/VXLAN.
Experience across multi-vendor network platforms/NOS (e.g., NVIDIA/Mellanox, Arista, Cisco, Juniper) and automating via REST/CLI with secure access patterns.
Strong DevOps mindset: CI/CD (Jenkins/GitLab), ZTP/onboarding, automated validation/compliance/health checks, strong Linux fundamentals, and clear cross-functional communication/ownership.
Preferred
Proven automation experience generating deployment artifacts from modeled intent (cutsheets, cable matrices, rack elevations, port mappings).
Experience with large-scale datacenter fabrics, including AI/ML infrastructure, GPU cluster networking, and HPC environments.
Cloud and hybrid networking expertise across Google Cloud, Azure, and Oracle Cloud, including cloud exchange/DCI providers (e.g., Equinix).
Broad multi-vendor platform experience (Arista, Cumulus, Cisco, Palo Alto, load balancers) plus observability integration (Prometheus/Grafana) tied into automation/validation workflows.
Strong platform engineering maturity: Kubernetes/containerization and Containerlab-based testing, principal-level architecture/standards/reuse, and operational documentation via Confluence/Jira/ServiceNow.