Software Engineer, Distributed Data Systems (Sora)

OpenAI
San Francisco, CA, USA2025-11-14

About the job

As a Software Engineer, Distributed Data Systems, you will design and scale the infrastructure that powers large-scale multimodal training and evaluation at OpenAI. You’ll manage distributed data pipelines, collaborate closely with researchers to translate requirements into robust systems, and harden pipelines that serve as the backbone for Sora’s rapid iteration cycles.

Responsibilities

Design, build, and maintain data infrastructure systems such as distributed compute, data orchestration, distributed storage, streaming infrastructure, machine learning infrastructure while ensuring scalability, reliability, and security.

Ensure our data platform can scale by orders of magnitude while remaining reliable and efficient

Partner with researchers to deeply understand requirements and translate them into production-ready systems.

Harden, optimize, and maintain critical data infrastructure systems that power multimodal training and evaluation.

Qualifications

Minimum

Have strong experience with distributed systems and large-scale infrastructure with a strong interest in data.

Are detail-oriented and bring rigor to building and maintaining reliable systems.

Demonstrate excellent software engineering fundamentals and organizational skills.

Are comfortable with ambiguity and rapid change

Preferred

No preferred qualifications listed.