Scholar

Pengyu Cheng

Google Scholar ID: eeQ_yCkAAAAJ

Alibaba Group

machine learningnatural language processing

Homepage↗Google Scholar↗

Citations & Impact

All-time

Citations

1,479

H-index

i10-index

Publications

Co-authors

list available

Contact

Emailpengyucheng95@gmail.com CVOpen ↗TwitterOpen ↗GitHubOpen ↗LinkedInOpen ↗

Publications

14 items

PolicyAlign: Direct Policy-Based Safety Alignment for Large Language Models

2026

Cited

GD$^2$PO: Mitigating Multi-Reward Conflicts via Group-Dynamic reward-Decoupled Policy Optimization

2026

Cited

F3-Tokenizer: Taming Audio Autoencoder Latents for Understanding and Generation

2026

Cited

Skill-RM: Unifying Heterogeneous Evaluation Criteria via Agent Skill

2026

Cited

Trace2Skill: Distill Trajectory-Local Lessons into Transferable Agent Skills

2026

Cited

MARCH: Multi-Agent Reinforced Self-Check for LLM Hallucination

2026

Cited

Borderless Long Speech Synthesis

2026

Cited

CLIPO: Contrastive Learning in Policy Optimization Generalizes RLVR

2026

Cited

Resume (English only)

Academic Achievements

Published multiple papers including 'Self-playing Adversarial Language Game Enhances LLM Reasoning' and 'Adversarial Preference Optimization: Enhancing Your Alignment via RM-LLM Games'. Served as an Area Chair for ARR 2025. Multiple papers accepted at NAACL 2025, NeurIPS 2024, EMNLP 2024, ACL 2024, and other international conferences.

Research Experience

Currently a researcher at Alibaba Group, leading the Quark Foundation LLM RL Team. Previously worked with the RL & Agent Team at Moonshot AI (Kimi) and the Hunyuan LLM Team at Tencent AI Lab. Conducted research on Bayesian and probabilistic machine learning during graduate school.

Education

Received Ph.D. from the Department of Electric and Computer Engineering at Duke University in 2021, advised by Dr. Lawrence Carin. Graduated with B.S. from the Department of Mathematical Sciences at Tsinghua University in 2017.

Background

Researcher at Alibaba Group, leading the Quark Foundation LLM RL Team. Focuses on enhancing LLMs’ capacity via RLHF, RLVR, and agentic RL. Previously a member of the RL & Agent Team at Moonshot AI (Kimi) and the Hunyuan LLM Team at Tencent AI Lab. Research interests include LLM Self-play, Alignment (RLHF), Text Generation, and NLP Fairness.

Miscellany