Published 'Advancing General Multimodal Capability of Vision-language Models with Pyramid-descent Visual Position Encoding' in ACL Findings
Published 'MLLM Is a Strong Reranker: Advancing Multimodal Retrieval-augmented Generation via Knowledge-enhanced Reranking and Noise-injected Training' in EMNLP
Co-authored 'ToolExpNet: Optimizing Multi-Tool Selection in LLMs with Similarity and Dependency-Aware Experience Networks' in ACL Findings
Published 'Aspects are Anchors: Towards Multi-Modal Aspect-Based Sentiment Analysis via Aspect-Driven Alignment and Refinement' in ACM MM
Published 'Relevance Is a Guiding Light: Relevance-aware Adaptive Learning for End-to-end Task-oriented Dialogue System' in EMNLP
Published 'Dual-oriented Disentangled Network with Counterfactual Intervention for Multimodal Intent Detection' in EMNLP
Contributed to 'Towards Multimodal-augmented Pre-trained Language Models via Self-balanced Expectation-Maximization Iteration' in ACM MM
Contributed to 'Game on Tree: Visual Hallucination Mitigation via Coarse-to-Fine View Tree and Game Theory' in EMNLP
Research Experience
Involved in research projects such as improving visual position encoding and multimodal retrieval-augmented generation.
Education
Master's student in Computer Science at Peking University, advisor: Prof. Yuexian Zou; B.A. in Environmental Engineering from Sun Yat-sen University.
Background
A third-year graduate student in Computer Science at Peking University, advised by Prof. Yuexian Zou. Research focuses on enhancing the general capabilities of Vision-Language Models (VLMs) and developing trustworthy AI agents.
Miscellany
Served as a conference reviewer for ICLR and ACL Rolling Review (ARR, ACL, EMNLP, NAACL).