Longteng Guo 郭龙腾
Scholar

Longteng Guo 郭龙腾

Google Scholar ID: OaGRHWYAAAAJ
Associate Professor, Institute of Automation of the Chinese Academy Sciences (CASIA)
Multimodal Learning
Citations & Impact
All-time
Citations
1,695
 
H-index
17
 
i10-index
21
 
Publications
20
 
Co-authors
15
list available
Contact
No contact links provided.
Publications
20 items
Browse publications on Google Scholar (top-right) ↗
Resume (English only)
Academic Achievements
  • Selected Publications:
  • 1. SC-Tune: unleashing self-consistent referential comprehension in large vision language models. CVPR, 2024. Co-first author
  • 2. MAMO: Masked multimodal modeling for fine-grained vision-language representation learning. SIGIR, 2023. Co-first author
  • 3. Eve: Efficient vision-language pre-training with masked prediction and modality-aware moe. AAAI, 2023. Second author
  • 4. GPT-4's inspiration on multimodal large models in multimodal understanding, generation, and interaction. China Science Fund, 2023. Second author
  • 5. Non-autoregressive image captioning with counterfactuals-critical multi-agent learning. IJCAI, 2020. First author
  • 6. Aligning linguistic words and visual semantic units for image captioning. ACM MM, 2019. First author
  • 7. Normalized and geometry-aware self-attention network for image captioning. CVPR, 2019. First author
  • 8. Mscap: Multi-style image captioning with unpaired stylized text. CVPR, 2019. First author
  • 9. Show, tell and polish: ruminant decoding for image captioning. IEEE Transactions on Multimedia, 2019. First author
  • 10. Sketch-based image retrieval using generative adversarial networks. ACM MM, 2017. First author
  • Awards and Honors:
  • 1. CAS President's Award for Excellent Students
  • 2. Outstanding Graduate of Beijing Municipal Universities
  • 3. Champion, Pre-trained Video Understanding Competition, ACMMM 2021
  • 4. Champion, VATEX Video Description Challenge (Chinese and English Tracks), CVPR 2020
  • 5. Runner-up (Chinese Track) and Third Place (English Track), VATEX Video Description Challenge, ICCV 2019
  • 6. Champion, COCO-Places Scene Parsing Challenge, ICCV 2017
  • Patents:
  • 1. CN202210138974.0, A method and system for constructing an information representation model
  • 2. CN202110653593.1, A cross-modal understanding and generation method and device based on a multimodal pre-training model, Jing Liu, Xinxin Zhu, Fei Liu, Longteng Guo
Research Experience
  • 1. 2023.07-Present Associate Researcher, Zidong Taichu Large Model Research Center, Institute of Automation, Chinese Academy of Sciences
  • 2. 2021.07-2023.06 Algorithm Researcher, ByteDance AI Lab
Education
  • 1. 2016.09-2021.06 Ph.D., National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences
  • 2. 2012.09-2016.06 B.S., Xi'an Jiaotong University
Background
  • Research Interests: Multimodal foundation models, multimodal learning, image and video content analysis.