About the job
We are seeking a Physical AI Model Optimization Engineer to help bring cutting-edge robotic AI models onto Qualcomm Dragonwing chipsets using Qualcomm’s internal deployment and optimization toolchains. This role is highly execution-focused and centers on applying existing Qualcomm tools, workflows, and compilers to onboard, optimize, validate, and deploy advanced AI models for real-time robotic systems.
Responsibilities
Use Qualcomm’s internal AI toolchains to onboard, convert, and optimize large-scale research models for Dragonwing deployment.
Apply Qualcomm-supported quantization, compression, and mixed-precision workflows to meet latency, memory, and power constraints.
Execute hardware-aware graph transformations and operator adjustments using QC-provided graph tools and compilers.
Profile model performance across heterogeneous compute (NPU/DSP/GPU/CPU) using Qualcomm profiling utilities and diagnose optimization opportunities.
Validate accuracy, stability, and runtime behavior of quantized and optimized models on real robotic hardware.
Build automation, scripts, and reproducible processes around Qualcomm’s toolchains to accelerate onboarding throughput.
Provide bug reports, patches, or minor contributions back to Qualcomm tools where needed to support model deployment, but not as a primary responsibility.
Work closely with platform and robotics engineering teams to integrate optimized models into production systems.
Qualifications
Minimum
• Bachelor's degree in Computer Science, Engineering, Information Systems, or related field and 4+ years of Hardware Engineering, Software Engineering, Systems Engineering, or related work experience.
OR
Master's degree in Computer Science, Engineering, Information Systems, or related field and 3+ years of Hardware Engineering, Software Engineering, Systems Engineering, or related work experience.
OR
PhD in Computer Science, Engineering, Information Systems, or related field and 2+ years of Hardware Engineering, Software Engineering, Systems Engineering, or related work experience.
Preferred
3+ years of experience in embedded or on-device AI, model optimization, or performance engineering.
Strong hands-on experience applying quantization (PTQ/QAT), pruning, compression, mixed-precision tuning, and model transformation techniques.
Experience using vendor-specific AI optimization pipelines (Qualcomm preferred; others such as TensorRT, TVM, XLA, or ONNX Runtime are also relevant).
Proficiency with PyTorch (preferred), including model graph manipulation, tracing, and conversion workflows.
Understanding of the numerical behavior and accuracy trade-offs of low-precision AI.
Ability to diagnose and address performance bottlenecks in compute, memory, and bandwidth.
Experience deploying AI models to embedded or heterogeneous compute systems.
Experience with the Qualcomm AI Stack, NPU/DSP optimization, or related chipset-specific AI workflows.
Robotics domain experience, especially with real-time constraints.
Familiarity with large multimodal or transformer-based architectures.
C++ skills for minor kernel-level tweaks or integration fixes.