Kai Yang
Scholar

Kai Yang

Google Scholar ID: mdPLH-sAAAAJ
Tencent Hunyuan
Reinforcement LearningLLM
Citations & Impact
All-time
Citations
200
 
H-index
4
 
i10-index
4
 
Publications
13
 
Co-authors
5
list available
Resume (English only)
Academic Achievements
  • Paper 'Thinking-Free Policy Initialization Makes Distilled Reasoning Models More Effective and Efficient Reasoners' on Arxiv.
  • Paper 'CDSA: Conservative Denoising Score-based Algorithm for Offline Reinforcement Learning' accepted by AAMAS 2026.
  • Paper 'Novelty-Guided Data Reuse for Efficient and Diversified Multi-Agent Reinforcement Learning' accepted by AAAI 2025.
  • Paper 'A two-stage reinforcement learning-based approach for multi-entity task allocation' accepted by Engineering Applications of Artificial Intelligence.
  • Paper 'Exploration and Anti-Exploration with Distributional Random Network Distillation' accepted by ICML 2024.
  • Paper 'BATON: Aligning Text-to-Audio Model with Human Preference Feedback' accepted by IJCAI 2024.
  • Paper 'Using Human Feedback to Fine-tune Diffusion Models without Any Reward Model' accepted by CVPR 2024.
  • Paper 'Using Human Feedback to Fine-tune Diffusion Models without Any Reward Model' selected as HuggingFace daily paper.
  • Paper 'Exploration by Random Distribution Distillation' on Arxiv.
  • Paper 'GTLMA: Generalizable Hierarchical Learning for Tasks with Variable Entities' presented at 2023 International Conference on Frontiers of Robotics and Software Engineering (FRSE).
  • Paper 'CMBE: Curiosity-driven Model-Based Exploration for Multi-Agent Reinforcement Learning in Sparse Reward Settings' (Oral)
Research Experience
  • Served as a researcher at the Tencent Hunyuan X team, working on research related to reinforcement learning and large language models.
Education
  • Graduated from Tsinghua University in 2025 with a degree in Artificial Intelligence, supervised by Prof. Xiu Li, and received extensive research guidance from senior fellow student Jiafei Lyu.
Background
  • Currently a researcher at the Tencent Hunyuan X team, focusing on RL and LLM. Main research interests include reinforcement learning, especially LLM post-training, exploration mechanisms, and multi-agent reinforcement learning. Proficient and interested in using mathematical theory to optimize LLM and RL methods.
Miscellany
  • Personal interests not mentioned