Software Engineer, ML Data Infrastructure

About the job

Nuro takes a machine-learning-first approach to autonomous driving technology. In an ML-first system, the overall system performance depends heavily on the quantity and diversity of its training and evaluation data. The team plays a crucial role in the advancement of autonomous driving systems by creating a scalable and reliable data infrastructure. This infrastructure is designed to produce training and evaluation data derived from both on-road collected logs and simulation logs. Additionally, the team collaborates closely with system engineers to thoroughly validate the autonomous driving system before its deployment.

Responsibilities

Design and develop unified, introspectable, large-scale batch and streaming data pipelines that can ingest and process data across a wide range of use cases relevant to evaluation.

Create and implement a storage system capable of accommodating both the large volume and diverse range of evaluation and performance metrics.

Construct intuitive dashboards and reports to present evaluation results, facilitating straightforward comparisons that highlight both improvements and regressions of the ML components and the overall system.

Develop and maintain continuous testing and monitoring systems to guarantee the integrity and resilience of our data and associated data pipelines.

Develop data mining tools with applied ML techniques to support data discovery needs from Autonomy including Perception, Behavior, and Mapping

Develop data annotation tools to support first-party and third-party labeling workforce to provide high fidelity perception, mapping, and driving trajectory labels

Scale data annotation labels with applied State-of-the-art ML techniques

Qualifications

Minimum

You have a degree in BS, MS.c or Ph.D, plus 1+ years of relevant work experience

Strong proficiency in Python or similar languages

Domain experience: Experience working with large-scale data and building scalable & reliable systems/data pipelines; ability to understand and design complex systems

Technical excellence: Ability and willingness to deep dive into implementation, driving technical standards and best practices across broader software organization

A bachelor's degree in Computer Science, Electrical Engineering, or a closely related field

Preferred

Strong proficiency in C++ or other high-performance low-level languages

Strong knowledge of GCP, GCS, BigQuery, or PostgreSQL

Knowledge of data engineering, and its tooling and best practices

Knowledge of batch and streaming data processing, warehousing, and analytics solutions

Experience working with large-scale distributed data systems

Experience with system & framework design

Experience with data workflow orchestration platforms