About the job
Build and maintain scalable data infrastructure to support cutting-edge AI robotics research. Design dataset management systems including automated pipelines for data ingestion, processing, and curation. Develop visualization and inspection tools for dataset exploration and quality assessment. Research and implement state-of-the-art data filtering techniques including deduplication, quality scoring, and model-based filtering methods. Collaborate directly with science teams to support research projects through both infrastructure development and hands-on technical contribution to data preparation workflows.
Responsibilities
Build and maintain scalable data infrastructure to support cutting-edge AI robotics research.
Design dataset management systems including automated pipelines for data ingestion, processing, and curation.
Develop visualization and inspection tools for dataset exploration and quality assessment.
Research and implement state-of-the-art data filtering techniques including deduplication, quality scoring, and model-based filtering methods.
Collaborate directly with science teams to support research projects through both infrastructure development and hands-on technical contribution to data preparation workflows.
Qualifications
Minimum
5+ years of non-internship professional software development experience
5+ years of programming with at least one software programming language experience
5+ years of leading design or architecture (design patterns, reliability and scaling) of new and existing systems experience
Experience as a mentor, tech lead or leading an engineering team
Strong software engineering background with full-stack development experience
Deep understanding of machine learning fundamentals, particularly large-scale model training
Expertise in distributed systems, cloud computing, and scalable data processing
Experience with data pipeline design, ETL processes, and data management systems
Proficiency in translating academic concepts into production systems
Preferred
5+ years of full software development life cycle, including coding standards, code reviews, source control management, build processes, testing, and operations experience
Bachelor's degree in computer science or equivalent
Experience with dataset curation and quality assessment techniques Knowledge of computer vision and multimodal data processing
Background in research environments or supporting ML research workflows
Experience with data visualization and annotation tooling
Familiarity with modern data filtering and deduplication methodologies