Paper 'On Pre-training of Multimodal Language Models Customized for Chart Understanding' accepted by NeurIPS Workshop on Adaptive Foundation Models (Oct 2024)
Paper 'Surprising Observations in Basic Vision Language Model Capabilities' accepted by ECCVW (Aug 2024)
TAM-VT model ranked 5th in VOTS2024 Challenge at ECCV '24 (May 2024)
Paper 'IOU-Aware Multi-Expert Cascade Network via Dynamic Ensemble for Long-tailed Object Detection' accepted by ICASSP (Feb 2023)
Two papers accepted by AAAI (Nov 2022): 'Feature Pyramid Diffusion for Complex Scene Image Synthesis' and 'Target-free Text-guided Image Manipulation'
Paper 'Paraphrasing Is All You Need for Novel Object Captioning' accepted by NeurIPS 2022 (Sep 2022)
Master’s thesis on image manipulation awarded Honorable Master Thesis Award at IPPR 2022 (June 2022)
Paper 'Scene Graph Expansion for Semantics-Guided Image Outpainting' accepted by CVPR 2022 (Mar 2022)
Paper 'Cross-Modal Mutual Learning for Audio-Visual Speech Recognition and Manipulation' accepted by AAAI 2022 (oral presentation) (Dec 2021)
Paper 'LayoutTransformer: Scene Layout Generation with Conceptual and Spatial Diversity' accepted by CVPR 2021 (Feb 2021)
MEC detection model ranked 7th globally (Top 10) in the LVIS Challenge at ECCV workshop (Aug 2020)