Published papers include 'AHELM: A Holistic Evaluation of Audio-Language Models', 'ViLBench: A Suite for Vision-Language Process Reward Modeling' and more, accepted to NeurIPS 25, EMNLP 2025, etc. Involved in the development of open-source projects like OpenVision.
Research Experience
Currently working on Controllable/Efficient/Multimodal Text Generation (CTG / Efficient / MMGen). Also interested in AI-Safety problems in LLM-based models. Joining ByteDance Seed as a Student Research Scientist in June 2025.
Education
Ph.D. student at UCSC CSE, advised by Prof. Cihang Xie and Prof. Yuyin Zhou; M.Eng. from UCAS.
Background
Research interests: Natural Language Processing (NLP), multi-modal learning and their applications. Particularly interested in efficient&controllable generation (e.g., unsupervised, Plug-and-Play), multi-modal interactions (e.g., visual dialogue, captioning), and the combination of both. The ultimate goal is to empower any off-the-shelf language model with the ability to understand real-world experiences and interact with people.
Miscellany
Open for research collaborations and looking for potential intern positions in the summer of 2025. Contact: tuisaac163(at)gmail.com, Google Scholar, Github, Twitter