3. Eve: Efficient vision-language pre-training with masked prediction and modality-aware moe. AAAI, 2023. Second author
4. GPT-4's inspiration on multimodal large models in multimodal understanding, generation, and interaction. China Science Fund, 2023. Second author
5. Non-autoregressive image captioning with counterfactuals-critical multi-agent learning. IJCAI, 2020. First author
6. Aligning linguistic words and visual semantic units for image captioning. ACM MM, 2019. First author
7. Normalized and geometry-aware self-attention network for image captioning. CVPR, 2019. First author
8. Mscap: Multi-style image captioning with unpaired stylized text. CVPR, 2019. First author
9. Show, tell and polish: ruminant decoding for image captioning. IEEE Transactions on Multimedia, 2019. First author
10. Sketch-based image retrieval using generative adversarial networks. ACM MM, 2017. First author
Awards and Honors:
1. CAS President's Award for Excellent Students
2. Outstanding Graduate of Beijing Municipal Universities
3. Champion, Pre-trained Video Understanding Competition, ACMMM 2021
4. Champion, VATEX Video Description Challenge (Chinese and English Tracks), CVPR 2020
5. Runner-up (Chinese Track) and Third Place (English Track), VATEX Video Description Challenge, ICCV 2019
6. Champion, COCO-Places Scene Parsing Challenge, ICCV 2017
Patents:
1. CN202210138974.0, A method and system for constructing an information representation model
2. CN202110653593.1, A cross-modal understanding and generation method and device based on a multimodal pre-training model, Jing Liu, Xinxin Zhu, Fei Liu, Longteng Guo
Research Experience
1. 2023.07-Present Associate Researcher, Zidong Taichu Large Model Research Center, Institute of Automation, Chinese Academy of Sciences
2. 2021.07-2023.06 Algorithm Researcher, ByteDance AI Lab
Education
1. 2016.09-2021.06 Ph.D., National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences
2. 2012.09-2016.06 B.S., Xi'an Jiaotong University
Background
Research Interests: Multimodal foundation models, multimodal learning, image and video content analysis.