Scholar

Longteng Guo 郭龙腾

Google Scholar ID: OaGRHWYAAAAJ

Associate Professor, Institute of Automation of the Chinese Academy Sciences (CASIA)

Multimodal Learning

Homepage↗Google Scholar↗

Citations & Impact

All-time

Citations

1,695

H-index

i10-index

Publications

Co-authors

list available

Contact

No contact links provided.

Publications

20 items

Browse publications on Google Scholar (top-right) ↗

Resume (English only)

Academic Achievements

Selected Publications:
1. SC-Tune: unleashing self-consistent referential comprehension in large vision language models. CVPR, 2024. Co-first author
2. MAMO: Masked multimodal modeling for fine-grained vision-language representation learning. SIGIR, 2023. Co-first author
3. Eve: Efficient vision-language pre-training with masked prediction and modality-aware moe. AAAI, 2023. Second author
4. GPT-4's inspiration on multimodal large models in multimodal understanding, generation, and interaction. China Science Fund, 2023. Second author
5. Non-autoregressive image captioning with counterfactuals-critical multi-agent learning. IJCAI, 2020. First author
6. Aligning linguistic words and visual semantic units for image captioning. ACM MM, 2019. First author
7. Normalized and geometry-aware self-attention network for image captioning. CVPR, 2019. First author
8. Mscap: Multi-style image captioning with unpaired stylized text. CVPR, 2019. First author
9. Show, tell and polish: ruminant decoding for image captioning. IEEE Transactions on Multimedia, 2019. First author
10. Sketch-based image retrieval using generative adversarial networks. ACM MM, 2017. First author
Awards and Honors:
1. CAS President's Award for Excellent Students
2. Outstanding Graduate of Beijing Municipal Universities
3. Champion, Pre-trained Video Understanding Competition, ACMMM 2021
4. Champion, VATEX Video Description Challenge (Chinese and English Tracks), CVPR 2020
5. Runner-up (Chinese Track) and Third Place (English Track), VATEX Video Description Challenge, ICCV 2019
6. Champion, COCO-Places Scene Parsing Challenge, ICCV 2017
Patents:
1. CN202210138974.0, A method and system for constructing an information representation model
2. CN202110653593.1, A cross-modal understanding and generation method and device based on a multimodal pre-training model, Jing Liu, Xinxin Zhu, Fei Liu, Longteng Guo

Research Experience

1. 2023.07-Present Associate Researcher, Zidong Taichu Large Model Research Center, Institute of Automation, Chinese Academy of Sciences
2. 2021.07-2023.06 Algorithm Researcher, ByteDance AI Lab

Education

1. 2016.09-2021.06 Ph.D., National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences
2. 2012.09-2016.06 B.S., Xi'an Jiaotong University

Background