Sr. SDE- ML Data Infrastructure, Frontier AI Robotics

Amazon
San Francisco, California, USA2025-10-16ONSITE

About the job

Build and maintain scalable data infrastructure to support cutting-edge AI robotics research. Design dataset management systems including automated pipelines for data ingestion, processing, and curation. Develop visualization and inspection tools for dataset exploration and quality assessment. Research and implement state-of-the-art data filtering techniques including deduplication, quality scoring, and model-based filtering methods. Collaborate directly with science teams to support research projects through both infrastructure development and hands-on technical contribution to data preparation workflows.

Responsibilities

Build and maintain scalable data infrastructure to support cutting-edge AI robotics research.

Design dataset management systems including automated pipelines for data ingestion, processing, and curation.

Develop visualization and inspection tools for dataset exploration and quality assessment.

Research and implement state-of-the-art data filtering techniques including deduplication, quality scoring, and model-based filtering methods.

Collaborate directly with science teams to support research projects through both infrastructure development and hands-on technical contribution to data preparation workflows.

Qualifications

Minimum

5+ years of non-internship professional software development experience

5+ years of programming with at least one software programming language experience

5+ years of leading design or architecture (design patterns, reliability and scaling) of new and existing systems experience

Experience as a mentor, tech lead or leading an engineering team

Strong software engineering background with full-stack development experience

Deep understanding of machine learning fundamentals, particularly large-scale model training

Expertise in distributed systems, cloud computing, and scalable data processing

Experience with data pipeline design, ETL processes, and data management systems

Proficiency in translating academic concepts into production systems

Preferred

5+ years of full software development life cycle, including coding standards, code reviews, source control management, build processes, testing, and operations experience

Bachelor's degree in computer science or equivalent

Experience with dataset curation and quality assessment techniques Knowledge of computer vision and multimodal data processing

Background in research environments or supporting ML research workflows

Experience with data visualization and annotation tooling

Familiarity with modern data filtering and deduplication methodologies