- Exploring the Limit of Outcome Reward for Learning Mathematical Reasoning, COLM 2025
- Mask-DPO: Generalizable Fine-grained Factuality Alignment of LLMs, ICLR 2025
- ANAH-v2: Scaling Analytical Hallucination Annotation of Large Language Models, NeurIPS 2024
- ANAH: Analytical Annotation of Hallucinations in Large Language Models, ACL 2024
- One more set: Mitigating conflict-based cache side-channel attacks by extending cache set, JSA
Co-author Papers:
- CompassVerifier: A Unified and Robust Verifier for LLMs Evaluation and Outcome Reward, EMNLP 2025
- Semi-off-Policy Reinforcement Learning for Vision-Language Slow-thinking Reasoning, NeurIPS 2025
- The Imitation Game: Turing Machine Imitator is Length Generalizable Reasoner, preprint
- BackCache: Mitigating contention-based cache timing attacks by hiding cache line evictions, preprint
- Redeem myself: Purifying backdoors in deep learning models using self attention distillation, Oakland 2023
Projects Released:
- Intern-S1, July 2025
- InternThinker, December 2024
Research Experience
Working on research projects at Shanghai AI Laboratory.
Education
Received bachelor's degree from Wuhan University in 2024; currently a PhD student at School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University, advised by Wenwei Zhang and Kai Chen.
Background
PhD student at Shanghai Jiao Tong University, in the joint program at Shanghai AI Laboratory. Research interests primarily in Large Language Model (LLM), focusing on improving the reasoning and knowledge capabilities of LLMs. Also experienced in reducing hallucination in LLMs, including annotation, detection, and mitigation.