Paper accepted at ACL 2025: 'Adversarial Preference Learning for Robust LLM Alignment'
Two papers accepted at ICML 2025: 'Emergent Response Planning in LLM' and 'C-3PO: Compact Plug-and-Play Proxy Optimization to Achieve Human-like Retrieval-Augmented Generation'
Paper accepted at NeurIPS 2024: 'Weak-to-Strong Search: Align Large Language Models via Searching over Small Language Models'
Paper accepted at EMNLP 2024: 'Inference-Time Language Model Alignment via Integrated Value Guidance'
Paper accepted at ECCV 2024: 'MM-SafetyBench: A Benchmark for Safety Evaluation of Multimodal Large Language Models'
Three papers accepted at ACL 2024, including an Oral presentation: 'Emulated Disalignment: Safety Alignment for Large Language Models May Backfire!'
Two papers published at CVPR 2024: 'VideoDistill: Language-aware Vision Distillation for Video Question Answering' and 'LLaMA-Excitor: General Instruction Tuning via Indirect Feature Interaction'
Proposed the 'AI 45°-Law' framework toward trustworthy AGI (Dec 2024)
Released multimodal reasoning model SafeWork-R1 (Jul 2025)