Senior Perception Engineer, Obstacle Foundation Models

About the job

We are seeking an exceptional Senior Perception Engineer to help design and productize NVIDIA’s next-generation autonomous driving perception stack. You will work on the core 3D obstacle perception pipeline, contribute to architecture and algorithm design, and remain deeply hands-on with implementation, including modern transformer-based, multi-modal, and vision-language techniques where they add real value.

Responsibilities

Develop and improve the technical design, architecture, and roadmap for 3D obstacle perception to support end-to-end autonomous driving functionalities, leveraging state-of-the-art CNN and transformer-based architectures where appropriate.

Design and implement advanced 3D perception models using multi-camera inputs and/or multi-sensor fusion (camera, radar, lidar) for obstacle detection and tracking, including opportunities to explore BEV and transformer-based 3D perception.

Build efficient, production-grade deep learning models: define objectives with the team, select and prototype architectures, run experiments, and follow best practices for training and evaluation, using techniques such as large-scale pretraining, distillation, and parameter-efficient fine-tuning (e.g., LoRA).

Help define and maintain KPI frameworks to quantify perception performance; analyze large-scale real and synthetic datasets to identify failure modes and systematically improve accuracy, robustness, and efficiency, incorporating approaches like self-supervised and representation learning when beneficial.

Contribute to the data strategy for perception: specify data and labeling requirements, help prioritize data collection and annotation, and collaborate with data and ground-truth teams, including model-assisted workflows (e.g., active learning, auto-labeling, vision-language models (VLMs)) and model-in-the-loop tooling.

Collaborate with safety, systems, and software teams to ensure perception solutions meet product requirements for safety, latency, resource usage, and software robustness, and are ready for deployment at scale.

Qualifications

Minimum

PhD with 4+ years, MS with 6+ years, or BS (or equivalent experience) with 8+ years of relevant experience in Computer Science, Computer Engineering, or a related technical field.

Hands-on experience developing deep learning–based perception or closely related systems for complex real-world problems, with strong proficiency in frameworks such as PyTorch and a track record of taking models from prototype to production.

Proven experience in data-driven development, including close collaboration with data, labeling, and ground-truth teams on data strategy, labeling quality, and iterative model improvement.

Strong programming skills in Python and/or C++, with experience building reliable, high-performance, production-quality software.

Excellent communication and collaboration skills, with the ability to work effectively across multidisciplinary teams.

Preferred

Experience designing and deploying perception solutions for autonomous driving or robotics using camera-based deep learning at scale.

Hands-on experience architecting and deploying DNN-based perception pipelines on embedded or real-time platforms, including optimization for latency, memory, and compute constraints, and experience with modern architectures such as CNNs and transformers, plus familiarity with techniques like large-scale pretraining, parameter-efficient fine-tuning (e.g., LoRA), or vision-language models (VLMs).

Strong publication record or recognized contributions in deep learning, computer vision, or autonomous systems at leading conferences/journals (e.g., CVPR, ICCV, NeurIPS, IROS).

Deep understanding of 3D computer vision fundamentals, including camera modeling and calibration (intrinsic and extrinsic), multi-view geometry, and 3D representations, ideally with experience applying these concepts in transformer-based 3D or BEV perception pipelines.