Scholar
Josef Dai
Google Scholar ID: eRmX5AsAAAAJ
Zhejiang University
Alignment
Follow
Google Scholar
↗
Citations & Impact
All-time
Citations
3,007
H-index
12
i10-index
14
Publications
20
Co-authors
5
list available
Contact
No contact links provided.
Publications
6 items
Debate with Images: Detecting Deceptive Behaviors in Multimodal Large Language Models
2025
Cited
0
SafeVLA: Towards Safety Alignment of Vision-Language-Action Model via Safe Reinforcement Learning
2025
Cited
0
Align Anything: Training All-Modality Models to Follow Instructions with Language Feedback
2024
Cited
0
PKU-SafeRLHF: Towards Multi-Level Safety Alignment for LLMs with Human Preference
2024
Cited
50
Language Models Resist Alignment: Evidence From Data Compression
2024
Cited
2
Reward Generalization in RLHF: A Topological Perspective
2024
Cited
4
Resume (English only)
Co-authors
5 total
Jiaming Ji (吉嘉铭)
Peking University
Boyuan Chen
Peking University
Jiayi Zhou
Peking University Ph.D Student
Tianyi Alex Qiu
Peking University
Donghai Hong
Peking University
×
Welcome back
Sign in to Agora
Welcome back! Please sign in to continue.
Email address
Password
Forgot password?
Continue
Do not have an account?
Sign up