Staff Machine Learning Engineer – Model Optimization & Quantization

About the job

Join the Qualcomm AI Hub team and help developers integrate machine learning into their products and experiences: https://aihub.qualcomm.com/. In this role you will develop tools to help developers optimize and deploy machine learning models on edge and mobile hardware. AIMET is Qualcomm's open-source library for state-of-the-art model quantization, and compression techniques. You will develop and support cutting-edge model optimization workflows — pushing the boundary of what's possible on resource-constrained hardware. Applications range from quantizing large language models (LLMs) and generative AI models to compressing latency-critical vision, audio, and multimodal networks for deployment on Qualcomm Snapdragon and other edge SoCs. For this role we are seeking a talented and motivated Staff Software Engineer with expertise in the optimizing and deploying ML models – especially for edge devices.

Responsibilities

Design, develop, and maintain quantization algorithms and compression pipelines within the AIMET framework (PTQ, QAT, mixed-precision, AdaScale etc.)

Implement advanced quantization techniques including weight-only quantization, activation quantization, KV-cache quantization, and sub-4-bit quantization for LLMs and generative AI models

Build tooling to analyze, profile, and debug model accuracy degradation caused by quantization

Integrate AIMET workflows with popular ML frameworks — PyTorch and ONNX

Develop APIs and developer-facing tooling to make AIMET accessible and easy to use for external customers and design partners

Integrate AIMET in AI Hub Workbench Quantize job to enable Quantization at large scale.

Own end-to-end quantization and optimization of models published on Qualcomm AI Hub, ensuring they meet accuracy, latency, and power targets on Qualcomm hardware

Quantize and validate a broad range of model families — vision transformers, LLMs, diffusion models, speech, and multimodal architectures — for deployment via AI Hub

Develop and maintain automated quantization pipelines and evaluation harnesses to scale model onboarding across AI Hub's growing model catalog

Qualifications

Minimum

Bachelor's degree in Computer Science, Engineering, Information Systems, or related field and 4+ years of Hardware Engineering, Software Engineering, Systems Engineering, or related work experience. OR Master's degree in Computer Science, Engineering, Information Systems, or related field and 3+ years of Hardware Engineering, Software Engineering, Systems Engineering, or related work experience. OR PhD in Computer Science, Engineering, Information Systems, or related field and 2+ years of Hardware Engineering, Software Engineering, Systems Engineering, or related work experience.

Preferred

3+ years of industry experience in machine learning, deep learning, or AI infrastructure Strong proficiency in Python, with hands-on experience in PyTorch, ONNX and/or TensorFlow Solid understanding of neural network architectures — CNNs, Transformers, LLMs, diffusion models, multimodal models Experience with model quantization techniques — PTQ, QAT, weight-only quantization, mixed-precision, sub-4-bit methods Hands-on experience quantizing LLMs (GPT, LLaMA, Mistral, Falcon, or similar families) for inference optimization Familiarity with AIMET, GPTQ, AWQ, SmoothQuant, or similar quantization frameworks is a strong plus Experience working with ONNX, TFLite/LiteRT, or other model interchange formats Understanding of hardware constraints: memory bandwidth, compute precision (INT4/INT8/FP16/BF16), and NPU/DSP execution Experience collaborating across teams or BUs to drive technical alignment and model delivery Proficiency with git and software development best practices Strong written and verbal communication skills — ability to write clean APIs, documentation, and engage directly with external developers Experience with C++ for performance-critical components is a bonus Familiarity with ARM processors and mobile SoC architecture (Snapdragon) is a plus Experience with automated evaluation pipelines and model benchmarking at scale is a plus