Published several papers, including 'MMComposition: Revisiting the Compositionality of Pre-trained Vision-Language Models' (ArXiv 2024) and 'PROMPTCAP: Prompt-Guided Image Captioning for VQA with GPT-3' (ICCV 2023). Also involved in multiple projects, such as developing new diagnostic benchmarks to assess MLLMs' capabilities and designing new MLLMs with enhanced competencies.
Research Experience
Currently a research scientist at MIT-IBM Watson AI Lab. Conducted research under the guidance of Prof. Jiebo Luo during his PhD studies at the University of Rochester.
Education
PhD from the University of Rochester, advised by Prof. Jiebo Luo (Fellow of ACM/AAAI/IEEE/NAI/AIMBE/IAPR/SPIE); Master's degree from Peking University; Bachelor’s degree from South China University of Technology.
Background
Research Interests: Generative AI, particularly Multimodal LLMs (MLLMs) and Pre-trained Language Models (PLMs). Focuses on addressing core limitations such as Compositionality, Fine-grained Visual Perception, Robustness, and Reasoning.
Miscellany
Contact: hhua2 [A-T] cs.rochester [D-O-T] edu. Find more information on Google Scholar, GitHub, and LinkedIn.