Researcher, Agentic Post-Training

About the job

OpenAI is looking for exceptional researchers to join the Post-Training Frontiers team, which is responsible for post-training the agentic models we ship across Codex, the API, ChatGPT Thinking, and ChatGPT Pro. The Post-Training Frontiers team sets up the pipeline for deciding which integrations can go into the post-training run, develops its own horizontal improvements to the model, and trains the final model.

Responsibilities

- Own end-to-end research and engineering projects that improve the final post-training of OpenAI’s agentic models.

- Decide, together with partner teams, which integrations are ready for inclusion in major model runs.

- Develop horizontal model improvements across factuality, instruction following, tool/function calling, multi-agent behavior, reasoning-effort calibration, and other broad capabilities.

- Build and improve training, evaluation, grading, and data infrastructure for large-scale RL/post-training runs.

- Create evals and diagnostics that help us understand whether a model is ready to ship.

- Improve the feedback loop from real product usage into post-training, including better ways to learn from implicit user feedback.

- Collaborate closely with Codex, API, ChatGPT, product, training, and other post-training teams to make frontier models more useful, reliable, and agentic.

Qualifications

Minimum

- Have strong ML fundamentals and hands-on experience with LLMs, RL, RLHF, post-training, evals, or model training.

- Are an unusually strong engineer who can move quickly in complex systems and make pragmatic technical decisions.

- Can own ambiguous problems end-to-end without needing a tightly specified roadmap.

- Care more about impact than method, and are happy to do unglamorous but load-bearing work when it matters.

- Have excellent taste in model behavior and can reason about what “good” looks like across many user-facing domains.

- Are comfortable working across research, infrastructure, data, evals, and product boundaries.

- Are excited to train and ship the frontier agentic models that power Codex, ChatGPT, and the API.

Preferred

- Experience with large-scale model training or RL systems.

- Experience building evals, graders, reward models, or data pipelines for LLM training.

- Experience with coding agents, tool-using agents, browser/computer-use agents, function calling, or multi-agent systems.

- Background in quant, systems, infra, or other environments where you built reliable machinery for high-stakes experimentation.

- Evidence of strong product taste, especially around writing, design, code generation, or agent workflows.