Scholar

Kai Yang

Google Scholar ID: mdPLH-sAAAAJ

Tencent Hunyuan

Reinforcement LearningLLM

Homepage↗Google Scholar↗

Citations & Impact

All-time

Citations

200

H-index

i10-index

Publications

Co-authors

list available

Contact

Emailyangkaisigsrl@gmail.com GitHubOpen ↗

Publications

7 items

RLVR Datasets and Where to Find Them: Tracing Data Lineage for Better Training Data

2026

Cited

Debiased Model-based Representations for Sample-efficient Continuous Control

2026

Cited

Learning to Foresee: Unveiling the Unlocking Efficiency of On-Policy Distillation

2026

Cited

Listwise Policy Optimization: Group-based RLVR as Target-Projection on the LLM Response Simplex

2026

Cited

Composition-RL: Compose Your Verifiable Prompts for Reinforcement Learning of Large Language Models

2026

Cited

Learning beyond Teacher: Generalized On-Policy Distillation with Reward Extrapolation

2026

Cited

Small Generalizable Prompt Predictive Models Can Steer Efficient RL Post-Training of Large Reasoning Models

2026

Cited

Resume (English only)

Academic Achievements

Paper 'Thinking-Free Policy Initialization Makes Distilled Reasoning Models More Effective and Efficient Reasoners' on Arxiv.
Paper 'CDSA: Conservative Denoising Score-based Algorithm for Offline Reinforcement Learning' accepted by AAMAS 2026.
Paper 'Novelty-Guided Data Reuse for Efficient and Diversified Multi-Agent Reinforcement Learning' accepted by AAAI 2025.
Paper 'A two-stage reinforcement learning-based approach for multi-entity task allocation' accepted by Engineering Applications of Artificial Intelligence.
Paper 'Exploration and Anti-Exploration with Distributional Random Network Distillation' accepted by ICML 2024.
Paper 'BATON: Aligning Text-to-Audio Model with Human Preference Feedback' accepted by IJCAI 2024.
Paper 'Using Human Feedback to Fine-tune Diffusion Models without Any Reward Model' accepted by CVPR 2024.
Paper 'Using Human Feedback to Fine-tune Diffusion Models without Any Reward Model' selected as HuggingFace daily paper.
Paper 'Exploration by Random Distribution Distillation' on Arxiv.
Paper 'GTLMA: Generalizable Hierarchical Learning for Tasks with Variable Entities' presented at 2023 International Conference on Frontiers of Robotics and Software Engineering (FRSE).
Paper 'CMBE: Curiosity-driven Model-Based Exploration for Multi-Agent Reinforcement Learning in Sparse Reward Settings' (Oral)

Research Experience

Served as a researcher at the Tencent Hunyuan X team, working on research related to reinforcement learning and large language models.

Education

Graduated from Tsinghua University in 2025 with a degree in Artificial Intelligence, supervised by Prof. Xiu Li, and received extensive research guidance from senior fellow student Jiafei Lyu.

Background

Currently a researcher at the Tencent Hunyuan X team, focusing on RL and LLM. Main research interests include reinforcement learning, especially LLM post-training, exploration mechanisms, and multi-agent reinforcement learning. Proficient and interested in using mathematical theory to optimize LLM and RL methods.

Miscellany

Personal interests not mentioned

Co-authors

5 total

Xiu LI

Tsinghua University

Jiafei Lyu

PhD of Control Science and Engineering, Tsinghua University

Haonan Han

PhD Candidate, Tsinghua University

Huan Liao

The Chinese University of Hong Kong, Shenzhen

Co-author 5