One paper, 'What if Othello-Playing Language Models Could See?', accepted to EMNLP 2025.
Presented RAVENEA, a large-scale benchmark for culture-aware multimodal retrieval-augmented visual understanding tasks.
Introduced ChatMotion, a multimodal multi-agent framework for human motion analysis.
Conducted a vector space alignment study to investigate whether vision and language models share concepts.
Launched FoodieQA, a fine-grained image-text dataset for Chinese food culture understanding.
Analyzed the robustness of a retrieval-augmented captioning model, SmallCap, proposing methods to improve its performance.
Research Experience
Spent two wonderful years at CoAStaL for research.
Education
Received a Master's degree in Computer Science at the University of Copenhagen, advised by Prof. Anders Søgaard; currently pursuing a PhD at the University of Copenhagen and the University of Cambridge, advised by Prof. Serge Belongie and Prof. Ivan Vulić respectively.
Background
An ELLIS PhD student with research interests revolving around the convergence of natural language processing and computer vision, focusing on gaining insights from human cognition. Enthusiastic about exploring language grounding within multimodal contexts and investigating the linguistic and cognitive characteristics of models.
Miscellany
Links: Github / BlueSky / Google Scholar / X / Email