Scholar

Yuzhe Gu

Google Scholar ID: NaiWQ5oAAAAJ

Shanghai Jiao Tong University

Large Language ModelScalable OversightKnowledge and Reasoning

Homepage↗Google Scholar↗

Citations & Impact

All-time

Citations

801

H-index

i10-index

Publications

Co-authors

list available

Contact

GitHubOpen ↗

Publications

11 items

Long-horizon Reasoning Agent for Olympiad-Level Mathematical Problem Solving

2025

Cited

OPV: Outcome-based Process Verifier for Efficient Long Chain-of-Thought Verification

2025

Cited

MM-HELIX: Boosting Multimodal Long-Chain Reflective Reasoning with Holistic Platform and Adaptive Hybrid Policy Optimization

2025

Cited

OBCache: Optimal Brain KV Cache Pruning for Efficient Long-Context LLM Inference

2025

Cited

RL in the Wild: Characterizing RLVR Training in LLM Deployment

2025

Cited

CompassVerifier: A Unified and Robust Verifier for LLMs Evaluation and Outcome Reward

2025

Cited

Semi-off-Policy Reinforcement Learning for Vision-Language Slow-thinking Reasoning

2025

Cited

The Imitation Game: Turing Machine Imitator is Length Generalizable Reasoner

2025

Cited

Resume (English only)

Academic Achievements

Publications:
- Exploring the Limit of Outcome Reward for Learning Mathematical Reasoning, COLM 2025
- Mask-DPO: Generalizable Fine-grained Factuality Alignment of LLMs, ICLR 2025
- ANAH-v2: Scaling Analytical Hallucination Annotation of Large Language Models, NeurIPS 2024
- ANAH: Analytical Annotation of Hallucinations in Large Language Models, ACL 2024
- One more set: Mitigating conflict-based cache side-channel attacks by extending cache set, JSA
Co-author Papers:
- CompassVerifier: A Unified and Robust Verifier for LLMs Evaluation and Outcome Reward, EMNLP 2025
- Semi-off-Policy Reinforcement Learning for Vision-Language Slow-thinking Reasoning, NeurIPS 2025
- The Imitation Game: Turing Machine Imitator is Length Generalizable Reasoner, preprint
- BackCache: Mitigating contention-based cache timing attacks by hiding cache line evictions, preprint
- Redeem myself: Purifying backdoors in deep learning models using self attention distillation, Oakland 2023
Projects Released:
- Intern-S1, July 2025
- InternThinker, December 2024

Research Experience

Working on research projects at Shanghai AI Laboratory.

Education

Received bachelor's degree from Wuhan University in 2024; currently a PhD student at School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University, advised by Wenwei Zhang and Kai Chen.

Background

PhD student at Shanghai Jiao Tong University, in the joint program at Shanghai AI Laboratory. Research interests primarily in Large Language Model (LLM), focusing on improving the reasoning and knowledge capabilities of LLMs. Also experienced in reducing hallucination in LLMs, including annotation, detection, and mitigation.

Miscellany