About the job
An applied research team within NVIDIA’s Networking Systems & Software Architecture group is solving some of AI’s hardest infrastructure problems. The team builds systems-level software that moves data between GPUs, nodes, and storage at the speed modern AI demands—spanning low-level transport optimization, hardware-software co-design, and communication frameworks that plug directly into production AI stacks. The team's charter expands into emerging domains including quantum computing interconnects. This Engineering Manager role leads a team responsible for that work—owning execution, and setting technical direction. It calls for someone technically strong enough to drive architecture and focused on creating an extraordinary engineering organization!
Responsibilities
Lead and develop a team of systems and networking engineers building distributed AI communication systems—libraries, frameworks, and system integrations.
Setting the technical roadmap in partnership with principal engineers and architects, balancing near-term delivery with long-term research bets.
Creating a culture of technical excellence and open collaboration. Handling project planning, resource allocation, and delivery timelines across concurrent workstreams.
Qualifications
Minimum
8+ overall years of software engineering experience with advanced knowledge in systems software, networking, or distributed systems. This experience allows credible participation in architecture reviews and making informed trade-off decisions.
3+ years of direct people management.
BS, MS, PhD or equivalent experience in Computer Science, Computer Engineering, or a related field.
Ability to scope a problem, set a plan, and deliver results in a fast-paced R&D environment.
Strong communication skills—comfortable speaking publicly, writing technical documents, and giving candid feedback.
Good understanding of computer architecture, memory hierarchies, DMA engines, and networking.
Proficiency in programming languages such as C, C++, Rust and Python.
Understanding of ML systems concepts—transformer architectures, KV cache mechanics, model parallelism, or distributed training and inference patterns.
Preferred
Knowledge of ML inference frameworks (vLLM, SGLang, TensorRT-LLM) and their communication requirements.
Familiarity with NVIDIA’s hardware and software ecosystem.
Experience with agile methodologies adapted for engineering teams dedicated to research.