About the job
As an Applied Scientist in Amazon Fullfilment Technology, you will lead the development of agentic systems to assist with operational decision making and orchestration. You will work building full agentic systems leveraging multi-agent orchestration, tool use, memory, and action execution. You will train LLMs using a combination of rejection sampling approaches, SFT, continual post-training, and Reinforcement Learning (RL). These systems are deployed to Amazon buildings, and you will also work on rigorous offline and online evaluations. Your work will leverage the latest LLMs to develop capabilities for agentic reasoning, coding and analytics. You will also lead research projects to tackle unsolved problems, mentor interns, and author academic papers to summarize your findings for external publication.
Responsibilities
Generating training and preference data for specific use cases (reasoning trajectories, tool traces)
Reward modeling and policy optimization for LLMs: DPO, IPO, RLHF/RLAIF with PPO/GRPO, rejection sampling.
Supervised fine-tuning on step-by-step trajectories and tool-use traces
Verbal Reinforcement Learning and Continual Learning
RL for LLMs, Offline RL and off-policy evaluation
Agentic memory/state management; episodic and semantic memory; vector search; grounding with RAG.
Evaluation: developing decision quality metrics, scaling LLM-based evaluations.
Qualifications
Minimum
3+ years of building models for business application experience
PhD, or Master's degree and 4+ years of CS, CE, ML or related field experience
Experience programming in Java, C++, Python or related language
Experience in any of the following areas: algorithms and data structures, parsing, numerical optimization, data mining, parallel and distributed computing, high-performance computing
Preferred
Experience using Unix/Linux
Experience in professional software development