Scholar

Zhoujun Cheng

Google Scholar ID: t41vrrQAAAAJ

UC San Diego

Natural Language ProcessingArtificial Intelligence

Homepage↗Google Scholar↗

Citations & Impact

All-time

Citations

1,759

H-index

i10-index

Publications

Co-authors

list available

Contact

Emailz6cheng@ucsd.edu TwitterOpen ↗GitHubOpen ↗LinkedInOpen ↗

Publications

8 items

CocoaBench: Evaluating Unified Digital Agents in the Wild

2026

Cited

IsoCompute Playbook: Optimally Scaling Sampling Compute for LLM RL

2026

Cited

Concise Reasoning in the Lens of Lagrangian Optimization

2025

Cited

K2-Think: A Parameter-Efficient Reasoning System

2025

Cited

Revisiting Reinforcement Learning for LLM Reasoning from A Cross-Domain Perspective

2025

Cited

Esoteric Language Models

2025

Cited

MegaMath: Pushing the Limits of Open Math Corpora

2025

Cited

BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions

arXiv.org · 2024

Cited

103

Resume (English only)

Academic Achievements

Revisiting Reinforcement Learning for LLM Reasoning from A Cross-Domain Perspective
MegaMath: Pushing the Limits of Open Math Corpora
OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments
What Are Tools Anyway? A Survey from the Language Model Perspective
OpenAgents: An Open Platform for Language Agents in the Wild
Lemur: Harmonizing Natural Language and Code for Language Agents
Binding Language Models in Symbolic Languages
HiTab: A Hierarchical Table Dataset for Question Answering and Natural Language Generation
FORTAP: Using Formulae for Numerical-Reasoning-Aware Table Pretraining

Research Experience

Conducted research at HKUNLP and Microsoft Research Asia.

Education

Currently a Ph.D. student at UC San Diego, advised by Prof. Zhiting Hu; previously received B.E. and M.E. degrees in Computer Science (IEEE class) from Shanghai Jiao Tong University, and had research experiences at HKUNLP advised by Prof. Tao Yu and Microsoft Research Asia with Haoyu Dong.

Background

Research interests include natural language processing, particularly large language model (LLM) reasoning and their interaction with the digital world. Before the LLM era, focused on structured data reasoning over tables and spreadsheets.

Co-authors

13 total