Selected Publications: 3DLLM-Mem: Long-Term Spatial-Temporal Memory for Embodied 3D Large Language Model (CVPR 2025 Best Paper Award), Verbalized Representation Learning for Interpretable Few-Shot Generalization (ICCV 2025), MRAG-Bench: Vision-Centric Evaluation for Retrieval-Augmented Multimodal Models (ICLR 2025), Matryoshka Query Transformer for Large Vision-Language Models (NeurIPS 2024), BLIVA: A Simple Multimodal LLM for Better Handling of Text-Rich Visual Questions (AAAI 2024), VALOR-EVAL: Holistic Coverage and Faithfulness Evaluation of Large Vision-Language Models (ACL 2024 Findings). Academic Service: Conference Reviewer for ICLR 2025, NeurIPS 2025, CVPR 2025, ICCV 2025, ACL 2025, EMNLP 2025, NAACL 2025; Journal Reviewer for TPAMI and International Journal of Robotics Research. Awards: Best Paper Award at CVPR 2025 Foundation Models Meet Embodied Agents Workshop, UCLA CS Departmental Fellowship Award.
Research Experience
Currently a first-year CS PhD student at UCLA. Previously served as a Teaching Assistant for CSE151A: Intro to Machine Learning at UCSD (Winter 2023).
Education
Ph.D. in Computer Science at University of California, Los Angeles (UCLA), advised by Prof. Kai-Wei Chang and Prof. Nanyun Peng; M.S. in Computer Science from UCLA in 2024; B.S. in Data Science at Halicioglu Data Science Institute (HDSI) from University of California, San Diego (UCSD) in 2023, fortunate to have worked with Prof. Zhuowen Tu and Prof. Hao Su during undergraduate study.
Background
Primary research interest lies in the intersection of vision, language, and agentic. Particularly, has worked on 2D and 3D vision-language models in visual understanding and embodied tasks, and evaluation benchmarks for multimodal models. Long-term goal is to build intelligent systems that can perceive, understand, and interact with the complex physical world.
Miscellany
Contact: GitHub, Google Scholar, LinkedIn, Email, Twitter (X), DBLP. Actively looking for strong and motivated graduate and undergraduate students to collaborate.