Scholar

Yash Chandak

Google Scholar ID: OAUZgecAAAAJ

Postdoctoral Scholar, Stanford University

Reinforcement LearningMachine Learning

Homepage↗Google Scholar↗

Citations & Impact

All-time

Citations

969

H-index

12

i10-index

16

Publications

20

Co-authors

14

list available

Contact

TwitterOpen ↗GitHubOpen ↗LinkedInOpen ↗

Publications

2 items

Contrastive Policy Gradient: Aligning LLMs on sequence-level scores in a supervised-friendly fashion

Conference on Empirical Methods in Natural Language Processing · 2024

Cited

1

The GPT Surprise: Offering Large Language Model Chat in a Massive Coding Class Reduced Engagement but Increased Adopters Exam Performances

arXiv.org · 2024

Cited

22

Resume (English only)

Academic Achievements

2025: Published 'Command A: An Enterprise-Ready Large Language Model'; 2024: Published 'Data-Efficient Policy Evaluation Through Behavior Policy Search' (with Josiah P. Hanna, Philip S. Thomas, Martha White, Peter Stone, Scott Niekum) and 'Information Directed Tree Search: Reasoning and Planning with Language Agents' (with HyunJi Nam, Allen Nie, Jonathan Lee, Emma Brunskill).

Research Experience

Currently working on the post-training team at Cohere. Previously, worked as a postdoc for Prof. Emma Brunskill at Stanford University.

Education

PhD from the University of Massachusetts, advised by Prof. Philip Thomas.

Background

Research Interests: Reinforcement learning and large language models. Previously worked as a postdoc at Stanford University.

Miscellany

Email: y[lastname]@stanford.edu

Co-authors

14 total

Philip S. Thomas

University of Massachusetts

Associate Professor of Computer Science, Stanford University

University of Alberta

Associate Professor, University of Massachusetts Amherst

Sridhar Mahadevan

Director, Adobe Research & Professor, University of Massachusetts, Amherst

Bruno Castro da Silva

University of Massachusetts