Principal GPU/NPU AI System Architect

About the job

The AI Architect will define and drive end-to-end AI system architecture for embedded and edge platforms, with deep expertise in GPU/NPU micro-architecture, AI software stacks, and model behavior. This role bridges silicon capabilities, system software, and AI models, enabling performant, power-efficient, and safe AI deployments across robotics, automotive, and industrial markets. The architect will own technical solutioning from model selection through deployment, working closely with silicon, compiler, software, and product teams, and will represent the AI architecture vision with customers and partners.

Responsibilities

Develop deep architectural understanding of GPU, NPU, and heterogeneous SoC designs, including memory hierarchies, interconnects, scheduling, and power/performance trade-offs.

Guide HW–SW co-optimization strategies for AI workloads across vision, perception, planning, and control.

Influence silicon and platform roadmaps using model-driven architectural insights from robotics, automotive, and industrial workloads.

Collaborate across silicon, system engineering, software, thermal/mechanical, security, and product teams.

Technically lead internal AI engineers and work closely with partners, ISVs, and customers.

Act as a technical authority and mentor, influencing architecture decisions without direct reporting authority.

Architect AI solutions with strong understanding of model internals (CNNs, Transformers, multi-modal models, sensor fusion, perception stacks).

Evaluate and map model characteristics (latency, memory bandwidth, precision, sparsity) onto GPU/NPU execution.

Drive model optimization strategies (quantization, pruning, distillation, compilation flows) aligned with embedded constraints.

Define and optimize AI software stacks spanning: Frameworks (PyTorch, ONNX, TensorRT-like runtimes) Compilers, graph optimizers, and runtime schedulers Drivers, firmware, and OS integration Lead solutioning for edge and embedded deployment, including OTA updates, lifecycle management, and production-grade robustness. Ensure scalability from prototype → production → long-term maintenance. Robotics: perception, localization, SLAM, manipulation, real-time decision pipelines. Automotive: ADAS, autonomous perception, sensor fusion, safety-critical AI execution. Industrial: vision inspection, predictive maintenance, autonomous systems, real-time analytics. Translate domain use-cases into architectural requirements and reusable platform capabilities.

Qualifications

Minimum

No minimum qualifications listed.

Preferred

Deep expertise in GPU and/or NPU architecture and execution models.

Strong hands-on experience with AI models and inference pipelines, not just framework usage.

Proven background in embedded / edge AI systems.

Strong understanding of hardware-aware model optimization techniques.

Experience in robotics, automotive, or industrial AI domains.

Ability to translate customer problems into scalable architectural solutions. Motivating leader with good interpersonal skills; cross-functional & external leadership