Scholar

Han Shen

Google Scholar ID: UeWSr6oAAAAJ

Research Engineer, Ant Group; Ph.D., Rensselaer Polytechnic Institute

OptimizationReinforcement LearningAlignment

Homepage↗Google Scholar↗

Citations & Impact

All-time

Citations

463

H-index

i10-index

Publications

Co-authors

list available

Contact

Emailshenhanhs@gmail.com GitHubOpen ↗

Publications

8 items

Normalizing Flow-Enhanced Message Passing for Multirobot Collaborative Localization

2026

Cited

Revis: Sparse Latent Steering to Mitigate Object Hallucination in Large Vision-Language Models

2026

Cited

Light Alignment Improves LLM Safety via Model Self-Reflection with a Single Neuron

2026

Cited

On Entropy Control in LLM-RL Algorithms

2025

Cited

Kwai Keye-VL 1.5 Technical Report

2025

Cited

Fundamental Safety-Capability Trade-offs in Fine-tuning Large Language Models

2025

Cited

Mitigating Forgetting in LLM Supervised Fine-Tuning and Preference Learning

arXiv.org · 2024

Cited

On Penalty-based Bilevel Gradient Descent Method

International Conference on Machine Learning · 2023

Cited

Resume (English only)

Academic Achievements

Paper 'AEnt' published, its asynchronous implementation is incorporated in the highly scalable RL framework AReaL.
Paper 'SEAL: Safety-enhanced Aligned LLM Fine-tuning via Bilevel Data Selection' accepted at ICLR 2025.
Paper 'Principled Penalty-based Methods for Bilevel Reinforcement Learning and RLHF' accepted at ICML 2024, extended work in JMLR.
Paper 'Mitigating Forgetting in LLM Supervised Fine-Tuning and Preference Learning' published.
Paper 'On Penalty-based Bilevel Gradient Descent Method' accepted at ICML 2023, extended work in Mathematical Programming.
Paper 'Mitigating Gradient Bias in Multi-objective Learning: A Provably Convergent Approach' accepted as an oral presentation at ICLR 2023.

Education

Ph.D. from RPI, supervised by Dr. Tianyi Chen (now at Cornell Tech). He was the first Ph.D. student in Dr. Tianyi Chen's group, focusing on optimization and reinforcement learning.

Background

Currently a senior research engineer at Ant Group, working on a variety of LLM alignment and reinforcement learning. Previously, he worked as a research intern at IBM Research AI, collaborating with Pin-Yu Chen, Payel Das, Songtao Lu, Xiaodong Cui, and many other talented researchers. His research at IBM focused on LLM alignment and offline RL.

Miscellany